Apache Spark Fundamentals

Pluralsight

Course Summary

This course will teach you how to use Apache Spark to analyze your big data at lightning-fast speeds, leaving Hadoop in the dust!

+
Course Description

Our ever-connected world is creating data faster than Moore's law can keep up, making it so that we have to be smarter in our decisions on how to analyze it. Previously, we had Hadoop's MapReduce framework for batch processing, but modern big data processing demands have outgrown this framework. That's where Apache Spark steps in, boasting speeds 10-100x faster than Hadoop and setting the world record in large scale sorting. Spark's general abstraction means it can expand beyond simple batch processing, making it capable of such things as blazing-fast, iterative algorithms and exactly once streaming semantics. In this course, you'll learn Spark from the ground up, starting with its history before creating a Wikipedia analysis application as one of the means for learning a wide scope of its core API. That core knowledge will make it easier to look into Spark's other libraries, such as the streaming and SQL APIs. Finally, you'll learn how to avoid a few commonly encountered rough edges of Spark. You will leave this course with a tool belt capable of creating your own performance-maximized Spark application.

Course Description

Our ever-connected world is creating data faster than Moore's law can keep up, making it so that we have to be smarter in our decisions on how to analyze it. Previously, we had Hadoop's MapReduce framework for batch processing, but modern big data processing demands have outgrown this framework. That's where Apache Spark steps in, boasting speeds 10-100x faster than Hadoop and setting the world record in large scale sorting. Spark's general abstraction means it can expand beyond simple batch processing, making it capable of such things as blazing-fast, iterative algorithms and exactly once streaming semantics. In this course, you'll learn Spark from the ground up, starting with its history before creating a Wikipedia analysis application as one of the means for learning a wide scope of its core API. That core knowledge will make it easier to look into Spark's other libraries, such as the streaming and SQL APIs. Finally, you'll learn how to avoid a few commonly encountered rough edges of Spark. You will leave this course with a tool belt capable of creating your own performance-maximized Spark application.

+
Course Syllabus

Getting Started
- 46m 20s

â€”Why Spark? 7m 14s
â€”Hadoop Explosion to Spark Unification 7m 14s
â€”Spark's Background 6m 18s
â€”Installation 6m 55s
â€”Spark Programming Languages 2m 44s
â€”Hello Big Data! 6m 35s
â€”Logistics 3m 28s
â€”Resources 4m 55s
â€”Summary 0m 53s

Spark Core: Part 1
- 56m 59s

â€”Intro 2m 9s
â€”Spark Appification 9m 20s
â€”What Is an RDD? 4m 36s
â€”Loading Data 9m 43s
â€”Lambdas 3m 15s
â€”Transforming Data 7m 12s
â€”More Transformations 4m 50s
â€”Actions and the Associative Property 2m 43s
â€”Acting on Data 9m 54s
â€”Persistence 1m 33s
â€”Resources 1m 1s
â€”Summary 0m 36s

Spark Core: Part 2
- 28m 5s

â€”Intro 0m 55s
â€”Implicit Conversions 3m 41s
â€”Key Value Methods 8m 48s
â€”Caching Data 6m 26s
â€”Accumulating Data 4m 13s
â€”Java in Spark 2m 48s
â€”Resources 0m 30s
â€”Summary 0m 42s

Distribution and Instrumentation
- 50m 14s

â€”Intro 0m 57s
â€”Spark Submit 7m 14s
â€”Cluster Management 6m 48s
â€”Standalone Cluster Scripts 7m 37s
â€”AWS Setup 5m 33s
â€”Spark on Yarn in EMR 8m 25s
â€”Spark UI 10m 47s
â€”Resources 1m 53s
â€”Summary 0m 56s

Spark Libraries
- 1h 3m

â€”Intro 1m 54s
â€”Spark SQL 9m 52s
â€”Spark SQL Demo 12m 0s
â€”Spark SQL Demo - The SQL Side 1m 38s
â€”Streaming 5m 18s
â€”Streaming Demo 10m 36s
â€”Machine Learning 3m 46s
â€”Machine Learning Demo 5m 42s
â€”GraphX 4m 28s
â€”GraphX Demo 3m 32s
â€”Resources 2m 47s
â€”Summary 1m 37s

Optimizations and the Future
- 22m 25s

â€”Intro 0m 37s
â€”Closures 5m 57s
â€”Broadcasting 4m 40s
â€”Optimizing Partitioning 4m 37s
â€”Spark's Future 4m 45s
â€”Resources 0m 51s
â€”Summary 0m 55s

Course Fee:

USD 29

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

This course is listed under Open Source , Cloud Computing , Development & Implementations , Industry Specific Applications , Data & Information Management and Server & Storage Management Community

Java

Application Program Interface (API)

AWS (Amazon Web Services)

Attended this course? Write a Review

Course Fee:

USD 29

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

IT Career Development Platform

Apache Spark Fundamentals

Pluralsight

Course Summary

Course Description

Course Description

Course Syllabus

Course Type:

Course Status:

Workload:

SQL (Structured Query Language)

Hadoop

Apache Spark

Apache

Big Data

Machine Learning

MapReduce

Java

Application Program Interface (API)

AWS (Amazon Web Services)

Attended this course? Write a Review

Course Type:

Course Status:

Workload: