Hadoop on Azure. An Introduction to Big Data Using HDInsight
A Pragmatic Introduction To HDInsight
Massive amounts of data are being collected on just about everything and only a small part of that data is being analyzed.
In 2014, every second over 5700 tweets were sent and 870 Facebook links were sent.
In 2013, about 4.4 zettabytes of data were created and approximately 5% of it was analyzed.
By 2020, it’s estimated that we will collect 44 zettabytes of data and the amount we analyze will jump to 40%.
One of the most overused words in recent times is “Big Data”
But what does the word really mean?
Big data refers to data being collected in ever-escalating volumes, at increasingly high velocities, and for a widening variety of unstructured formats and variable semantic contexts.
Big data describes any large body of digital information, from the text in a Twitter feed, to the sensor information from industrial equipment, to information about customer browsing and purchases on an online catalog.
Big data can be historical (meaning stored data) or real-time (meaning streamed directly from the source).
For big data to provide actionable intelligence or insight, not only must the right questions be asked and data be relevant to the issues be collected, the data must be accessible, cleaned, analyzed, and then presented in a useful way.
HDInsight is a cloud implementation on Microsoft Azure of the rapidly exanding Apache Hadoop technology stack that is the go-to solution for big data analysis.
It includes implementations of Storm, HBase, Pig, Hive, Sqoop, Oozie, Ambari, and so on. HDInsight also integrates with business intelligence (BI) tools such as Excel, SQL Server Analysis Services, and SQL Server Reporting Services.
Note: This is not a hands on course. This course creates a knowledge foundation for my next course in this series which is using what we've learned to create a real world end to end big data solution with Azure HDInsight.