Hadoop was created by computer scientists Doug Cutting and Mike Cafarella. Hadoop provides the complete ecosystem on open source projects which gives us the distributed frameworks to deal with big data.
Hadoop can handle any form of structured as well as unstructured data which in turn gives users flexibility for collecting, processing and analyzing data which are not being provided by traditional data warehouses.
Hadoop supports advanced analytics such as predictive analytics, data mining and machine learning. Commercial distribution of Hadoop has been handled by four major vendors such as Cloudera, Horton Networks, Amazon Web Services (AWS) and MapR Technologies. Google and Microsoft also provide cloud-based managed services which can be used with Hadoop and related technologies.
Hadoop can process massive amounts of data at once. It runs on clusters of commodity servers which also scales to support thousands of hardware nodes.
MapReduce, the Hadoop Distributed File System (HDFS) and Hadoop Common, a set of shared utilities and libraries are the core components in the first iteration of Hadoop.
MapReduce reduces functions using a map to split them into multiple tasks.
As Hadoop can analyse and process a large chunk of data, it enables the large organization to create small ponds or lakes of data thus acting as a reservoir for the massive amount of data. Continuous cleansing of data compliments the data lakes to make a set of transaction data.
Hadoop is commonly used in customer behaviour analytics predicting the churn, loss and behaviour patterns of users. Enterprise uses Hadoop to analyse pricing and managing the safe driver discount programs. Healthcare uses Hadoop to make the treatments more effective to the patients also to understand the pattern of the particular disease.
Apache offers the open source frameworks for Hadoop. Hadoop can also analyse semi-structured data.