Learn By Example: Hadoop, MapReduce for Big Data problems

Udemy

Course Summary

A hands-on workout in Hadoop, MapReduce and the art of thinking "parallel"

+
Course Description
Taught byÂ a 4 person team including 2Â Stanford-educated, ex-GooglersÂ and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data.Â

This course is a zoom-in, zoom-out,Â hands-on workout involving Hadoop, MapReduce and the art of thinking parallel.Â

Letâ€™s parse that.

Zoom-in, Zoom-Out:Â Â This course is both broad andÂ deep. It covers the individual components of Hadoop in great detail, and alsoÂ gives you a higher level picture of how they interact with each other.Â

Hands-on workout involving Hadoop, MapReduce :Â This course will get you hands-on with Hadoop very early on.Â Â You'll learn how toÂ set up your ownÂ cluster using both VMs and the Cloud. All the major features of MapReduce are covered - including advanced topics like Total Sort and Secondary Sort.Â

The art of thinking parallel:Â MapReduce completelyÂ changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units isÂ an art. The examples in this courseÂ will train you to "think parallel".Â

What's Covered:

Lot's of cool stuff ..
- Using MapReduce toÂ
  - Recommend friends inÂ a Social Networking site:Â Generate Top 10 friend recommendations using a Collaborative filtering algorithm.Â
  - Build an Inverted Index for Search Engines:Â Use MapReduce to parallelize the humongous task of building an inverted index for a search engine.Â
  - GenerateÂ Bigrams from text:Â Generate bigrams and computeÂ their frequency distribution in a corpus of text.Â
- Build yourÂ Hadoop cluster:Â
  - InstallÂ Hadoop in Standalone, Pseudo-Distributed and Fully Distributed modesÂ
  - SetÂ up a hadoop cluster using Linux VMs.
  - Set up a cloud HadoopÂ cluster on AWSÂ with Cloudera Manager.
  - UnderstandÂ HDFS, MapReduce and YARNÂ and their interactionÂ
- Customize your MapReduce Jobs:Â
  - Chain multiple MRÂ jobs together
  - Write your ownÂ Customized Partitioner
  - Total SortÂ :Â Globally sortÂ a large amount of data by sampling input files
  - Secondary sortingÂ
  - Unit tests with MRÂ Unit
  - Integrate with Python using the Hadoop Streaming API
.. and of course all the basics:Â
- MapReduce :Â Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and Sort
- HDFS &Â YARN:Â Namenode, Datanode, Resource manager, Node manager, the anatomy of a MapReduce application, YARNÂ Scheduling,Â Configuring HDFSÂ and YARNÂ to performance tuneÂ your cluster.Â
Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2Â people developing technical video content.Â Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is toÂ *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!

Course Description

Taught byÂ a 4 person team including 2Â Stanford-educated, ex-GooglersÂ and 2 ex-Flipkart Lead Analysts. This team has decades of practical experience in working with Java and with billions of rows of data.Â

This course is a zoom-in, zoom-out,Â hands-on workout involving Hadoop, MapReduce and the art of thinking parallel.Â

Letâ€™s parse that.

Zoom-in, Zoom-Out:Â Â This course is both broad andÂ deep. It covers the individual components of Hadoop in great detail, and alsoÂ gives you a higher level picture of how they interact with each other.Â

Hands-on workout involving Hadoop, MapReduce :Â This course will get you hands-on with Hadoop very early on.Â Â You'll learn how toÂ set up your ownÂ cluster using both VMs and the Cloud. All the major features of MapReduce are covered - including advanced topics like Total Sort and Secondary Sort.Â

The art of thinking parallel:Â MapReduce completelyÂ changed the way people thought about processing Big Data. Breaking down any problem into parallelizable units isÂ an art. The examples in this courseÂ will train you to "think parallel".Â

What's Covered:

Lot's of cool stuff ..

Using MapReduce toÂ
- Recommend friends inÂ a Social Networking site:Â Generate Top 10 friend recommendations using a Collaborative filtering algorithm.Â
- Build an Inverted Index for Search Engines:Â Use MapReduce to parallelize the humongous task of building an inverted index for a search engine.Â
- GenerateÂ Bigrams from text:Â Generate bigrams and computeÂ their frequency distribution in a corpus of text.Â

Build yourÂ Hadoop cluster:Â
- InstallÂ Hadoop in Standalone, Pseudo-Distributed and Fully Distributed modesÂ
- SetÂ up a hadoop cluster using Linux VMs.
- Set up a cloud HadoopÂ cluster on AWSÂ with Cloudera Manager.
- UnderstandÂ HDFS, MapReduce and YARNÂ and their interactionÂ

Customize your MapReduce Jobs:Â
- Chain multiple MRÂ jobs together
- Write your ownÂ Customized Partitioner
- Total SortÂ :Â Globally sortÂ a large amount of data by sampling input files
- Secondary sortingÂ
- Unit tests with MRÂ Unit
- Integrate with Python using the Hadoop Streaming API

.. and of course all the basics:Â

MapReduce :Â Mapper, Reducer, Sort/Merge, Partitioning, Shuffle and Sort

HDFS &Â YARN:Â Namenode, Datanode, Resource manager, Node manager, the anatomy of a MapReduce application, YARNÂ Scheduling,Â Configuring HDFSÂ and YARNÂ to performance tuneÂ your cluster.Â

Using discussion forums

Please use the discussion forums on this course to engage with other students and to help each other out. Unfortunately, much as we would like to, it is not possible for us at Loonycorn to respond to individual questions from students:-(

We're super small and self-funded with only 2Â people developing technical video content.Â Our mission is to make high-quality courses available at super low prices.

The only way to keep our prices this low is toÂ *NOT offer additional technical support over email or in-person*. The truth is, direct support is hugely expensive and just does not scale.

We understand that this is not ideal and that a lot of students might benefit from this additional support. Hiring resources for additional support would make our offering much more expensive, thus defeating our original purpose.

It is a hard trade-off.

Thank you for your patience and understanding!

Course Fee:

USD 60

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

This course is listed under Development & Implementations , Industry Specific Applications and Data & Information Management Community

Hadoop

Big Data

Attended this course? Write a Review

Course Fee:

USD 60

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

Course Summary

Course Description

Course Description

Course Type:

Course Status:

Workload:

MapReduce

Hadoop

Big Data

Attended this course? Write a Review

Course Type:

Course Status:

Workload: