Foundation Course in Big Data Analytics in association with IBM
Intellipaat
Course Summary
About Course Foundation Course in Big Data Analytics in association with IBM
-
+
Course Description
About Course
Foundation Course in Big Data Analytics in association with IBM is an online Instructor Led Training Course for students and working professionals in any industry. As a part of this course, participants will get an in-depth understanding of Big data & Hadoop using IBM InfoSphere BigInsight tool. This course teaches HDFS, Map Reduce, Hive, Pig, Oozie, Flume, etcWatch Module Sample recording for free. Try before you buy!
About IBM Big Data Hadoop Training Course
IBM Career Education (CE)-Foundation Course in Big Data Analytics is a hands-on training course for those who want a foundation of IBM InfoSphere BigInsights. It will provide an overview of IBM’s Big Data strategy along with more detailed and descriptive information on Hadoop technology. The course also presents concepts required by System Administrator to work with Hadoop Distributed File system and concepts of MapReduce required by a developer. It provides learners an introduction to scheduling capabilities of Hadoop and how to use Oozie to control workflows and use Flume to load data into HDFS.Learning Objectives:
After completion of this course, you will be able to:
- Understand the need and importance of Big Data
- Describe functions and features of IBM InfoSphere BigInsights
- List the capabilities of Hadoop and HDFS
- Administer HDFS effectively
- Describe the use of Map Reduce
- Understand the process of setting up a Hadoop cluster
- Manage job execution
- Understand &Explain Oozie workflows
- Describe scenarios for loading data into HDFS
Recommended Audience:
- Professionals who want to build career in Big Data & Analytics
- Students from B.Tech/B.E/M.Tech/ME/MCA or any other disciplines enthusiastic about learning Big data & Analytics
- Programming Developers and System Administrators
Prerequisites:
- Basic knowledge of Linux will be beneficial
Duration:
The suggested duration for completing this course is 25 Hours.
Why Take this Course?
- Virtual Live Instructor-led training by subject matter experts carrying more than 12 yrs industry experience.
- 70% of extensive learning through hands-on exercises and assignments along with exposure to live project.
- 24*7 dedicated support for a lifetime.
- Online e-learning course access for 1 year
-
+
Course Syllabus
Module 1 – Introduction to Big Data
- System of Units / Binary System of Units
- The scale
- Explosion in data and real world events
- Is there really a need for Big Data?
- Streams and oceans of information
- Big Data presents big opportunities
- Merging the traditional and Big Data approaches
- Enterprise information architecture
- IBM Big Data platform strategy
- Enterprise class
- Different BigInsights editions for varying needs
- InfoSphere Streams
Module 2 – An Introduction to InfoSphere BigInsights
- InfoSphere BigInsights open source components
- BigInsights: Value Beyond Open Source
- BigInsights Content
- What is Hadoop?
- Open source programming
- Open source control
- Open source other
- InfoSphere BigInsights IBM components
- Web-based installation
- A rich management big data tool
- Running applications from the web console
- BigInsights and text analytics
- BigInsights text analytics development
- BigSheets – spreadsheet-style analysis
- GPFS-FSO
- Performance enhancements
Module 3 – Apache Hadoop and HDFS Overview
- Why Hadoop?
- How about technology?
- How long it will take to read 1TB of data?
- Parallel data processing is the answer!
- What do we care about when we process data?
- Why Hadoop when we have relational databases?
- RDMS and Hadoop – complementary, not competing
- Working with Hadoop
- HDFS – Hadoop Distributed File System
- Design principles of Hadoop
- More details about HDFS
- Hadoop system components overview
- Working with HDFS
- HDFS at a High Level
- MapReduce
- MapReduce programming abstraction overview
- NameNode
- NameNode directory structure
- Secondary NameNode
- DataNode
- JobTracker and TaskTrackers
- HDFS file blocks
- Storing file blocks into HDFS from client machine
- Rack Awareness
- HDFS commands
- HDFS file commands
- Web Console data management
- Web Console data view
- Working with files and directories
- Changing permissions
- Hadoop shell command
- Application status
- Workflows tab
- Application running status
- BigSheets
- BigSheets workbooks
- Manipulation of data in BigSheets
- Exercise introduction
Module 4 – GPFS-FPO
- GPFS-FPO: motivation
- GPFS-FPO: architecture
- Locality awareness
- Allows applications to define own logical block size
- Write Affinity: allow applications to dictate layout
- Pipelined replication: efficient replication of data
- Fast recovery
- Hybrid allocation: treat metadata and data differently
- Information lifecycle management (ILM)
- Comparison with HDFS and MapR
- BigInsights interface to GPFS-FPO
- BigInsights interface to GPFS-FPO – URI access
- GPFS cluster and file system concepts
- Cluster topology – pool stanza file
- Cluster topology – NSD stanza file
- GPFS – FPO – file system for BigInsights
Module 5 – BigInsights Web Console Security
- Installation type
- File system
- Web Console security
- Web Console roles
- Assigning groups to roles
- Flat file authentication
- LDAP or PAM authentication
- Web Console welcome
Module 6 – Introduction to MapReduce Programming
- MapReduce overview
- MapReduce
- SQL example of MapReduce
- The Map function
- Sort phase
- The Reduce function
- Combiner and Partition functions
- Streaming and pipes
- MapReduce example: wordcount
- MapReduce co-locating with HDFS
- MapReduce Processing
- Speculative execution
- MapReduce programming
- MapReduce – a tale of two APIs
- MapReduce Anatomy
- Basic reduce code
- MapReduce summary
- MapReduce programming using BigInsights
- Create a BigInsights project
- Create a BigInsights program
- Mapper class
- Reducer and driver classes
- Generated code
- Exercise introduction
Module 7 – Adaptive MapReduce
- Emerging workload patterns
- Adaptive MapReduce features
- Workload and resource management architecture
- Adaptive MapReduce architecture
- Optimized shuffling
- User interface for Adaptive MapReduce
- Administrative tasks
Module 8 – Setup, Configuration, and Administration of a Hadoop Cluster
- Setup of Hadoop clusters
- Starting points
- What can be compressed in Hadoop?
- Should I use compression with Hadoop?
- Compression with BigInsights?
- Enabling map output compression
- Enabling job output compression
- Working with SEQ files
- Capacity calculations
- Capacity planning
- Disks and file system
- Hardware considerations
- Networking considerations
- OS considerations
- Configuration of Hadoop clusters
- Configuration management
- Configuration files
- Preventing configuration property override
- hadoop-env.sh settings
- hdfs-site.xml settings
- core-site.xml settings
- mapred-site.xml configuration
- Administration of Hadoop clusters with BigInsights
- Setting rack topology (rack awareness)
- Example of rack awareness script
- ibm-hadoop.properties
- Cluster status
- Node administration
- Balancer
- Safemode at startup
- Safemode commands
- Dashboards
- Exercise introduction
Module 9 – Overview of Oozie
- Oozie workflows
- Action nodes
- Effect of the MapReduce APIs
- Control flows at a high level
- Control flow nodes
- Expression language functions
- Workflow EL functions
- Hadoop EL constants
- HDFS EL functions
- Workflow job
- properties
- Oozie Coordinator
- Oozie coordinator system
- Oozie components
- Coordinator
- Coordinator EL constants and function
- Synchronous data sets
- Coordinator application
- Coordinator job
- properties
- Invoking Oozie
- BigInsights workflow editor
- BigInsights application publishing
- Publishing an application
- Deploy the application
- Schedule the application
- Link multiple applications
- Link output to input
- Deploy the linked application
- Exercise introduction
Module 10 – Managing Job Execution
- FIFO scheduler
- Job execution
- Some Terminology
- FIFO scheduler – first in first out (default)
- Priorities in FIFO
- Fair scheduler
- FAIR scheduler
- FAIR scheduler – pools allocation
- FAIR scheduler – pools
- FAIR scheduler – minimum share
- FAIR scheduler – minimum share, no demand
- FAIR scheduler – minimum share exceeds slots
- FAIR scheduler – minimum share less than fair share
- FAIR scheduler – weights
- FAIR scheduler – weights example
- Multiple jobs per pool
- Configuring FAIR scheduler
- Example of an allocation file
- BigInsights Scheduler
- InfoSphere BigInsights Scheduler priorities
Module 11 – Moving Data into Hadoop
- Loading scenarios
- Load scenarios:
- Data is at rest
- Data in motion
- Streaming data
- Solution if data is from a data warehouse
- Load solution using Flume
- Data from a web server
- Workings of Sqoop
- Overview of Sqoop
- Sqoop connection
- Sqoop import
- Sqoop import examples
- Sqoop exports
- Sqoop export examples
- Additional export information
- Workings of Flume
- How Flume works?
- Consolidation
- Replicating and multiplexing
- Configuration of Flume
- Configuration example
- Flume sources
- Interceptors
- Flume sinks
- Flume channels
- Flume channel selectors
- Configuration details – components
- Configuration details – properties
- Configuration details – bindings
- Flume example
- Working with an agent
- Exercise introduction
Further Learning Roadmap:
To learnanalytics further and orient your career in that direction, following are the recommended study areas –- Business Analytics, Business Intelligence
- Text/Content Analytics
- Big Data Programming
- Predictive Analytics and Data Modelling