Data Transformations with Apache Pig
Pluralsight
Course Summary
Pig is an open source engine for executing parallelized data transformations which run on Hadoop. This course shows you how Pig can help you work on incomplete data with an inconsistent schema, or perhaps no schema at all.
-
+
Course Description
Pig is an open source software which is part of the Hadoop eco-system of technologies. Pig is great at working with data which are beyond traditional data warehouses. It can deal well with missing, incomplete, and inconsistent data having no schema. In this course, Data Transformations with Apache Pig, you'll learn about data transformations with Apache. First, you'll start with the very basics which will show you how to get Pig installed and get started working with the Grunt shell. Next, you'll discover how to load data into relations in Pig and store transformed results to files via load and store commands. Then, you'll work on a real world dataset where you analyze accidents in NYC using collision data from the City of New York. Finally, you'll explore advanced constructs such as the nested foreach and also gives you a brief glimpse into the world of MapReduce and shows you how easy it is to implement this construct in Pig. By the end of this course, you'll have a better understanding of data transformations with Apache Pig.
-
+
Course Syllabus
Course Overview- 2m 5s
—Course Overview 2m 5sIntroducing Pig- 20m 29s
—What You Need to Get Started 2m 29s
—Why Do We Need Data? 3m 1s
—Hive for Analytical Processing 2m 4s
—When Do We Use Apache Pig? 1m 50s
—Pig for Extract, Transform, and Load Operations 3m 47s
—Introducing Pig Latin 3m 19s
—Pig on Hadoop and Other Technologies 3m 56sUsing the GRUNT Shell- 18m 22sLoading Data into Relations- 45m 27sWorking with Basic Data Transformations- 36m 26sWorking with Advanced Data Transformations- 48m 16sExecuting MapReduce Using Pig- 24m 25s