Beginning Data Exploration and Analysis with Apache Spark

Pluralsight

Course Summary

80% of a data scientist's job is data preparation. This course is all about data preparation i.e. cleaning, transforming, summarizing data using Spark.

+
Course Description

Data preparation is a staple task for any data professional, whether you just want to explore data or develop sophisticated Machine Learning models. Spark is an engine that helps do this in a very intuitive way, using functional constructs that abstract the user from all the messiness of working with large datasets. In this course, Beginning Data Exploration and Analysis with Apache Spark, you'll go through exploratory data analysis and data munging with Spark, step-by-step. First, you'll explore RDDs and functional constructs that make processing in Spark extremely intuitive. Next, you'll discover how to transform and clean unstructured data. Finally, you'll learn how to summarize data along dimensions and how to model relationships to build co-occurrence networks. By the end of this course, you'll be able to use Spark to transform data in any way that you would like.

Course Description

Data preparation is a staple task for any data professional, whether you just want to explore data or develop sophisticated Machine Learning models. Spark is an engine that helps do this in a very intuitive way, using functional constructs that abstract the user from all the messiness of working with large datasets. In this course, Beginning Data Exploration and Analysis with Apache Spark, you'll go through exploratory data analysis and data munging with Spark, step-by-step. First, you'll explore RDDs and functional constructs that make processing in Spark extremely intuitive. Next, you'll discover how to transform and clean unstructured data. Finally, you'll learn how to summarize data along dimensions and how to model relationships to build co-occurrence networks. By the end of this course, you'll be able to use Spark to transform data in any way that you would like.

+
Course Syllabus

Course Overview
- 1m 42s

â€”Course Overview 1m 42s

Getting Started with Spark's Resilient Distributed Datasets
- 27m 11s

â€”The Role of Spark in Data Analysis 6m 3s
â€”Understanding the Components of Spark 4m 17s
â€”Installing Spark Standalone in a Local Environment 4m 21s
â€”Hello World: Loading a Data Set 3m 47s
â€”Understanding Resilient Distributed Datasets 8m 41s

Transforming and Cleaning Unstructured Data
- 32m 1s

â€”Analyzing Crime in New York City 4m 56s
â€”Programming in the Functional Paradigm 3m 53s
â€”Applying Functional Constructs to Transform Datasets 4m 39s
â€”Filtering Rows 1m 41s
â€”Transforming Records to Extract Fields 3m 41s
â€”Identifying and Filtering Missing Values 3m 45s
â€”Identifying and Filtering Anomalies 4m 43s
â€”Summarizing and Visualizing Crime in NYC 4m 40s

Summarizing Data Along Dimensions
- 30m 30s

â€”Representing Data Using Pair RDDs 5m 2s
â€”Creating a Pair RDD 3m 24s
â€”Summarizing Pair RDDs 3m 25s
â€”Computing a Daily Trend 3m 3s
â€”Merging Pair RDDs 3m 5s
â€”Adding a Dimension to an RDD 4m 10s
â€”Computing Averages with Pair RDDs 6m 6s
â€”Comparing Daily Averages 2m 11s

Modeling Relationships in the Marvel Social Universe
- 25m 59s

â€”Representing Datasets as Networks 5m 13s
â€”Finding the Most Influential Characters 5m 2s
â€”Building a Co-occurrence Network 7m 35s
â€”Finding the Most Important Relationships 8m 8s

Course Fee:

USD 29

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

This course is listed under Open Source , Development & Implementations , Data & Information Management and Server & Storage Management Community

Attended this course? Write a Review

Course Fee:

USD 29

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

IT Career Development Platform

Beginning Data Exploration and Analysis with Apache Spark

Pluralsight

Course Summary

Course Description

Course Description

Course Syllabus

Course Type:

Course Status:

Workload:

Apache Spark

Apache

Data Analysis

Machine Learning

Data Scientist

Attended this course? Write a Review

Course Type:

Course Status:

Workload: