on 16 January 19
Having listened to a few industry acquaintances recently as they shared their observations of the whole Big Data phenomenon, and then reading Gartner Inc.âs summary of itâs 2015 Hadoop Adoption Study I came to the conclusion that the state of play in this field is somewhat akin to where the adoption of the cloud and cloud-based services was just a few short years ago. At the time, there was a lot of hype about the cloud. Everyone was talking about it, software companies were evangelizing it, and most CIOs were watching it with keen interest. The unspoken realization was that while everyone had heard about the capital investment savings, operational cost improvements and scaling opportunities the cloud potentially presented most were still not very sure what it actually consisted of, or how it was to be implemented.
To some extent, Big Data seems to be in a very similar space today, going by what Iâve heard and read about its uptake. Its adoption still has a way to go, although it is definitely growing. The reasons for this, as unearthed by the study, seem understandable given the facts. Mining Big Data usually means having to implement Hadoop in order to store and work with large volumes of unstructured data. While the concept of Hadoop is one thing, actually putting together all the various components of a Big Data technology stack and working with them until pre-processed data is ready for analytics is still not all that easily done, relatively speaking. There is a shortage of technology professionals, as evidenced by the study, and those companies that have tried it donât appear to be evangelizing it much just yet.
Clearly thereâs more work and waiting to be done, but itâs all moving in the right direction as far as I can tell. For example, while a new generation of Hadoop technologists are in training, and practicing their new skills as we speak, there are companies such as Platfora, Altiscale and others in the Big Data landscape that have come out with offerings that attempt to provide diverse solutions that make it quicker to implement Hadoop and extract meaningful business output from it.
But there was one analysis of the survey findings that confused me, and that is that the low number of users of Hadoop implementations relative to the cost of cluster hardware and associated software seems to be a dissuading factor for its uptake. There are a number of reasons cited to explain the low number of users. I havenât read the detailed study report, but I would ask why the number of users was being associated with the investment in Hadoop implementation. Hadoop is not an end user application, it's a component in an analytics technology stack. Shouldnât the association have been between the business gain that could ultimately be attributed to a decision made on the basis of Big Data analytics insights vs the investment? For example, a supercomputer that analyses volumes of data to predict the weather would probably have very few direct users (weather experts), but their predictions could be used to prepare for the impacts of severe weather on an entire nation.
This is what makes me ask if there are a lot of potential analytics users that are missing out on its benefits because they are worried about the difficulty of implementing Hadoop, or because they don't fully understand it just yet. I'm sure that with the passage of time this gap will be filled. In the meantime their focus should remain on the end application, which is business analytics. Analytics does not necessarily need Big Data (or large volumes of unstructured data). It just needs enough data to make statistical models and correlations valid, and that data may well be in the large volumes of structured data that are already being captured in existing enterprise systems. But thatâs a topic for another day.