Apache Spark Fundamentals
Pluralsight
Course Summary
This course will teach you how to use Apache Spark to analyze your big data at lightning-fast speeds, leaving Hadoop in the dust!
-
+
Course Description
Our ever-connected world is creating data faster than Moore's law can keep up, making it so that we have to be smarter in our decisions on how to analyze it. Previously, we had Hadoop's MapReduce framework for batch processing, but modern big data processing demands have outgrown this framework. That's where Apache Spark steps in, boasting speeds 10-100x faster than Hadoop and setting the world record in large scale sorting. Spark's general abstraction means it can expand beyond simple batch processing, making it capable of such things as blazing-fast, iterative algorithms and exactly once streaming semantics. In this course, you'll learn Spark from the ground up, starting with its history before creating a Wikipedia analysis application as one of the means for learning a wide scope of its core API. That core knowledge will make it easier to look into Spark's other libraries, such as the streaming and SQL APIs. Finally, you'll learn how to avoid a few commonly encountered rough edges of Spark. You will leave this course with a tool belt capable of creating your own performance-maximized Spark application.
-
+
Course Syllabus
Getting Started- 46m 20s
—Why Spark? 7m 14s
—Hadoop Explosion to Spark Unification 7m 14s
—Spark's Background 6m 18s
—Installation 6m 55s
—Spark Programming Languages 2m 44s
—Hello Big Data! 6m 35s
—Logistics 3m 28s
—Resources 4m 55s
—Summary 0m 53sSpark Core: Part 1- 56m 59s
—Intro 2m 9s
—Spark Appification 9m 20s
—What Is an RDD? 4m 36s
—Loading Data 9m 43s
—Lambdas 3m 15s
—Transforming Data 7m 12s
—More Transformations 4m 50s
—Actions and the Associative Property 2m 43s
—Acting on Data 9m 54s
—Persistence 1m 33s
—Resources 1m 1s
—Summary 0m 36sSpark Core: Part 2- 28m 5sDistribution and Instrumentation- 50m 14sSpark Libraries- 1h 3mOptimizations and the Future- 22m 25s