Sunday, November 10, 2013

Big Data is Not Just Hadoop

Hybrid Solutions will Solve our Big Data Problems for Years to Come 

When I talk to the people on the front line of big data, I notice that the most common use case of big data is to provide visualization and analytics across the types of data and volumes of data we have in the modern world.  For many, it’s an expansion of the power of the data warehouse that deals with the new data bloated world in which we live.

 Today, you have bigger volumes, more sources and you are being asked to turn around analytics even faster than before.  Overnight runs are still in use, but real-time analytics are becoming more and more expected by our business users. 

To deal with the new volumes of data, the yellow elephant craze is in full swing and many companies are looking for ways to use Hadoop to store and process big data. Last week at Strata/Hadoop World, many of the keynote speeches talked about the fact that there are really no limits to Hadoop.  I agree. However, in data governance, you must consider not only the technical solutions, but also the processes and people in your organization, and you must fit the solutions to the people and process.

As powerful as Hadoop is, there still is a skill shortage of Map/Reduce coders and Pig scripters.  There are still talented analytics professionals who aren't experts in R yet. This shortage will be with us for decades as a new generation of IT workers are trained in Hadoop.

This is in part why so many Hadoop distributions are in the process of putting SQL on Hadoop.  This is also why many traditional analytics vendors are adding Hadoop and ways to access the Hadoop cluster from their SQL-based applications.  The two worlds are colliding and it's very good for world of analytics.

I’ve blogged about the cost of big data solutions, traditional enterprise solutions and how the differ.  In short, you tend to spend money on licenses when you have an old school analytics solution, while your money goes to expertise and training if you adopt a Hadoop-centric approach.  But even this line is getting blurry with SQL-based solutions opening up their queries to Hadoop storage. Analytical databases can deliver fast big data analytics with access to Hadoop, as well as compression and columnar storage when the data is stored within.  You don’t even need open source to have a term license model today.  They are available more and more in other data storage solutions, as are pay-per-use models that charge per terabyte.

If you have a big data problem that needs to be solved, don’t jump right on the Hadoop bandwagon.  Consider the impact that big data will have on your solutions and on your teams and take a long look at the new generation of columnar data storage and SQL-centric analytical platforms to get the job done.

Disclaimer: The opinions expressed here are my own and don't necessarily reflect the opinion of my employer. The material written here is copyright (c) 2010 by Steve Sarsfield. To request permission to reuse, please e-mail me.