Understanding the MapReduce Programming Model

Pluralsight

Course Summary

The MapReduce programming model is the de facto standard for parallel processing of Big Data. This course introduces MapReduce, explains how data flows through a MapReduce program, and guides you through writing your first MapReduce program in Java.

+
Course Description

Processing millions of records requires that you first understand the art of breaking down your tasks into parallel processes. The MapReduce programming model, part of the Hadoop eco-system, gives you a framework to define your solution in terms of parallel tasks, which are then combined to give you the final desired result. In this course, Understanding the MapReduce Programming Model, you'll get an introduction to the MapReduce paradigm. First, you'll learn how it helps you visualize how data flows through the map, partition, shuffle, and sort phases before it gets to the reduce phase and gives you the final result. Next, it will guide you through your very first MapReduce program in Java. Finally, you'll learn to extend the framework Mapper and Reducer classes to plug in your own logic and then run this code on your local machine without using a Hadoop cluster. By the end of this course, you will be able to break big data problems into parallel tasks to help tackle large-scale data munging operations.

Course Description

Processing millions of records requires that you first understand the art of breaking down your tasks into parallel processes. The MapReduce programming model, part of the Hadoop eco-system, gives you a framework to define your solution in terms of parallel tasks, which are then combined to give you the final desired result. In this course, Understanding the MapReduce Programming Model, you'll get an introduction to the MapReduce paradigm. First, you'll learn how it helps you visualize how data flows through the map, partition, shuffle, and sort phases before it gets to the reduce phase and gives you the final result. Next, it will guide you through your very first MapReduce program in Java. Finally, you'll learn to extend the framework Mapper and Reducer classes to plug in your own logic and then run this code on your local machine without using a Hadoop cluster. By the end of this course, you will be able to break big data problems into parallel tasks to help tackle large-scale data munging operations.

+
Course Syllabus

Course Overview
- 1m 27s

â€”Course Overview 1m 27s

Introducing MapReduce
- 29m 40s

â€”Huge Data Sets and Scalable Systems 6m 15s
â€”The Power and Complexity of Teamwork 5m 50s
â€”Thinking Parallel with MapReduce 5m 6s
â€”Basic Flow of a MapReduce Process 6m 14s
â€”Identifying MapReduce Applications 6m 13s

A "Hello World" MapReduce Job
- 39m 39s

â€”Download Hadoop Jars and Set up an Intellij Project 6m 3s
â€”The Map Class Hierarchy 4m 8s
â€”The Reduce Class Hierarchy 2m 55s
â€”The Driver Program 2m 8s
â€”Setting up the Input Data Files 1m 52s
â€”The Map Class Code 7m 40s
â€”The Reduce Class Code 5m 53s
â€”The Main Class and the MapReduce Job 5m 37s
â€”Running Our First MapReduce Job 3m 19s

Controlling Parallelism in Map and Reduce Phases
- 38m 6s

â€”Behind the Scenes of a MapReduce Task 5m 56s
â€”Using a Single Reducer 3m 33s
â€”Using Multiple Reducers 4m 36s
â€”Partition, Shuffle, and Sort 5m 5s
â€”Tweaking the Number of Reduce Tasks 1m 48s
â€”Optimize the Map Phase Using a Combiner 6m 17s
â€”Setting a Combiner Class On Your MapReduce 1m 18s
â€”Reducers as Combiners 5m 1s
â€”Constraints in Using Reducers as Combiners 4m 30s

Course Fee:

USD 29

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

This course is listed under Development & Implementations , Industry Specific Applications and Data & Information Management Community

Hadoop

Java

Big Data

Attended this course? Write a Review

Course Fee:

USD 29

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

Course Summary

Course Description

Course Description

Course Syllabus

Course Type:

Course Status:

Workload:

MapReduce

Hadoop

Java

Big Data

Attended this course? Write a Review

Course Type:

Course Status:

Workload: