Big Data is everywhere lately.
The term (and concept) of “Big Data” as we use it today has been around since the ‘90s. It was introduced by John R. Mashey while he was working for SGI. Since then data has been growing at a tremendous rate, with estimates of 2.5 quintillion bytes of data being produced daily. This data comes from multiple sources, including not only the web (social networks, user click stream), but also from industrial sensors, from the internet of things. The data stream is growing at the same time as cloud computing is reshaping the IT industry.
Every medium and large company is looking at its humongous data stores, filled with unstructured information from several sources, and trying to figure out how to get intelligence from them. And they turn to Big Data tools to find the solution.
Big Data tools are mainly based on “noSQL” solutions originated in the open source world. Companies understand that information is crucial no matter what the business is and to take control of their data, they need to engage in data management. In data management there is some data best suited for noSQL solutions and other data best suited for traditional data stores. Data with high volume, variety, and complexity makes noSQL solutions attractive, although a hybrid solution is usually the best approach.
The cost of using structured data rises with volume and complexity. Big Data is more cost effective for storage and data access, although knowledge on how to best deal with Big Data is still scarce and expensive. There is a threshold beyond which users are willing to give up the mature capabilities of a relational database for the ability to cost-effectively store and access the data.
Structured data mapped in relational databases based on SQL are the industry standard with extended roots in IT. Big Data is relatively new, and as a new technology it presents several pitfalls in how to best approach data and technical knowledge availability. Learning from data before it is fully organized is completely different from organizing data after knowing what you want to do with it.
Big Data offers several business advantages. Businesses can rely on Big Data for speedier processes, to get more data into the analysis, and to handle and relate complex data.
There are also some operational advantages:
But Big Data is no silver bullet.
Going for the Big Data approach is not a decision to make lightly. The open source nature of most tools, namely Hadoop from Apache Open Source, means that IT managers won’t get the typical support from software providers for solution maintenance and operation.
Also for data exploration, data analysts must rely (typically) on some form of MapReduce techniques as the data access paradigm, which requires a different mindset for data access that is alien to most developers.
Big Data is not a magic wand that you wave and data will organize itself to give you the answers you need. Before searching for results, you need to know what you are looking for. Big Data can help you formulate the questions based on the findings. It is an iterative process, but "unfortunately" to formulate the right question you still need to rely on people. Business knowledge is key and you’ll need experts with analytical skills in order to make sense of all the answers you can get from your Big Data initiatives.
Otherwise you risk ending up only with “42”.