Beginning Data Exploration and Analysis with Apache Spark
Pluralsight
Course Summary
80% of a data scientist's job is data preparation. This course is all about data preparation i.e. cleaning, transforming, summarizing data using Spark.
-
+
Course Description
Data preparation is a staple task for any data professional, whether you just want to explore data or develop sophisticated Machine Learning models. Spark is an engine that helps do this in a very intuitive way, using functional constructs that abstract the user from all the messiness of working with large datasets. In this course, Beginning Data Exploration and Analysis with Apache Spark, you'll go through exploratory data analysis and data munging with Spark, step-by-step. First, you'll explore RDDs and functional constructs that make processing in Spark extremely intuitive. Next, you'll discover how to transform and clean unstructured data. Finally, you'll learn how to summarize data along dimensions and how to model relationships to build co-occurrence networks. By the end of this course, you'll be able to use Spark to transform data in any way that you would like.
-
+
Course Syllabus
Course Overview- 1m 42s
—Course Overview 1m 42sGetting Started with Spark's Resilient Distributed Datasets- 27m 11s
—The Role of Spark in Data Analysis 6m 3s
—Understanding the Components of Spark 4m 17s
—Installing Spark Standalone in a Local Environment 4m 21s
—Hello World: Loading a Data Set 3m 47s
—Understanding Resilient Distributed Datasets 8m 41sTransforming and Cleaning Unstructured Data- 32m 1sSummarizing Data Along Dimensions- 30m 30sModeling Relationships in the Marvel Social Universe- 25m 59s