on 18 September 18
While the majority of large data stores continue to be based on conventional database and data warehouse technologies, the fastest growing data framework for Big Data is Hadoop. There are a number of opensource as well as proprietary implementations of Hadoop available in the market today. Each of them varies somewhat in the number of additional components available over and above the basic framework, but the core categories of components of the basic framework are largely the same. There's a file system, a MapReduce implementation model, some components to work with stored data and a number of components to perform administrative, management and scheduling tasks.
Understanding how Hadoop differs from traditional hardware and software technology options available and what advantages it offers over these technologies requires a conceptual understanding of what traditional hardware and software options are available in the market, how they scale, the costs of scaling and the practical difficulties in scaling beyond a point, followed by a an understanding of how Hadoop works and how it provides an alternative solution that overcomes the limitations of these traditional technologies. Even the most non-technical of CXOs by and large have an understanding of these traditional solutions, but at a time when Hadoop is still only an emerging technology (and it's become a very commonly heard term today for various reasons), how does one explain it to them in non-technical layman terms so that they can understand how it is different and why it may be able to add new and additional value to their businesses?
Since there is really no necessity for a non-technical person to understand all the inner workings of Hadoop, perhaps an analogy would serve the purpose of giving them enough of an appreciation of how it's different, why it may be cheaper and why existing technologies may not be able to do the same job. The analogy I'd use is that of a library of books.
What if it we wanted to know what were the top ten most commonly occurring words in all the English language books that exist in a certain country? To answer this question, we have to start by considering that (a) there is a set of books that are already available, and (b) that perhaps every day new books are added to this set of available books on a continuous basis. Furthermore, it's very likely that the existing books as well as the new ones coming in are in different languages, and that not all are in English.
If we were to go about answering this question about the top ten words using traditional technologies we'd need to first think about creating a library that all the books could be brought into and stored. We have the option of either building one really large library of a fixed size or we could think in terms of building a number of smaller or mid-sized libraries and adding new ones over time as needed. We'd also need a set of people to identify each book and separate out those which are in the English language only. We might need another set of people who'd perform the task of checking all these English language books out of the library and carrying them to yet another set of people in another location who would go through each book, identify every word in each one, and then pick out the words occurring most frequently.
It's easy to visualize that this would be a really tedious and time-consuming project, never mind the kind of expense that would be involved in building either one or even many libraries, having people run (or use vehicles) back and forth carrying books, and another army of people to read them. To add yet another dimension of easily understood practical difficulty, what if some of the books are in hard copy, some are in soft copy, and some are available in audio format? If new books keep coming in all the time, how does one cope with the problem of adequacy of storage and processing in the future? Would any CXO be willing to take the risk of trying to guess what size of library (or how many) would be enough, and also have the budget available to pay for all the labour and transport involved in bringing all those books for processing, and hiring enough people who could read them in different formats and do all the tabulation necessary to find the answer to the initial question? Probably not.
Coming next week - we'll look at how this problem can be solved using with a Hadoop analogy.
Read Part 2 here: How to explain Hadoop to a non-technical CXO ' Part 2
You may also be interested in:
- Does Analytics need Big Data?
- Missing the Analytics Forest for the Trees
- Proactive BI vs Predictive Analytics
- Business Intuition: A Key Input in Big Data Analytics
- Six Steps to Big Data Analytics Success
- Understanding the "Business" in Business Analytics
- Dealing with Data Democratization
- Sqoop vs Lingual
- How can I plan for future cloud readiness?
- How do I compare NoSQL databases?