Hadoop Admin Course Content
Installation of Hadoop and Hadoo Ecosystems
Installation of Hadoop components and ecosystems – Hive, Sqoop, Pig, Scala and Spark
Introduction to Big Data Hadoop. Understanding HDFS & Mapreduce
Introduction to Big Data & Hadoop and its Ecosystem, Map Reduce and HDFS – The importance of Big Data, how Hadoop fit into the framework, Hadoop Distributed File System – Replications, Block Size, Secondary Name node, High Availability. YARN – Resource Manager, Node Manager. Lab 1: Working with HDFS
Deep Dive in Mapreduce
How Mapreduce Works, How Reducer works, How Driver works, Combiners, Partitioners, Input Formats, Output Formats, Shuffle and Sort. Lab 2: Writing Word Count Program.
Hadoop Administration – Multi Node Cluster Setup using Amazon ec2
How to create a Hadoop cluster with 4 nodes, working with cluster and deploying a MapReduce job, how to write a MapReduce code and setting up the Cloudera Manager
Hadoop Administration – Cluster Configuration
The significance of the configuration files, overview of the configuration values and parameters, the parameters of Hadoop distributed file system, setting up the Hadoop environment, detailed configuration files like ‘Include’ and ‘Exclude’, the directory structure and files of Name node and Data node, Edit log and File system image for Hadoop administration and maintenance. Hands-on Exercise: Performance tuning of MapReduce.
Hadoop Administration – Maintenance, Monitoring and Troubleshooting
Deploying the checkpoint procedure, working with Metadata, data backup, safe mode, name node failure and recovery procedure, troubleshooting to resolve the various problems, knowing what to look for, node removal and more, the best practices in using the JMX tool for cluster monitoring, working with stack traces, using logs to monitor and troubleshoot, deploying the various open source tools for cluster monitoring, how to deploy the Job Scheduler, the process of job submission flow in MapReduce, scheduling of jobs on the same cluster, FIFO scheduling, Fair Scheduler configuration. Hands-on Exercise: Working with the MapReduce file system recovery.
Securing Hadoop Cluster with Kerbrose and other Advance topics
Hadoop advanced administration, Quorum Journal Manager, HDFS security and configuring Hadoop federation, the Hadoop platform security fundamentals, the process to secure the Hadoop platform, the importance of Kerberos, integrating with the Hadoop platform, Hadoop cluster configuration with Kerberos.
Hadoop Admin Project
Project 1 : Streaming Twitter Data using Flume Topics:This project is associated with giving you hands-on experience in deploying Apache Flume for extracting Twitter streaming data and getting it into Hadoop for analysis. You will learn to handle high volumes data spikes, horizontal data scaling to accommodate increased data volumes and data delivery guarantee.
Project 2 : Hive & Impala comparisonTopics–Installation of CDH5 Apache Hive and Apache Impala, comparing the two tools for data querying, the advantages of Hive as a data warehouse for summarization and analysis, the advantage of Impala as a massively parallel processing and SQL like querying engine for high speed querying of data in HDFS.