MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious

Foundation Course in Big Data Analytics in association with IBM

Course Summary

About Course Foundation Course in Big Data Analytics in association with IBM


  • +

    Course Syllabus

    Module 1 – Introduction to Big Data

    • System of Units / Binary System of Units
    • The scale
    • Explosion in data and real world events
    • Is there really a need for Big Data?
    • Streams and oceans of information
    • Big Data presents big opportunities
    • Merging the traditional and Big Data approaches
    • Enterprise information architecture
    • IBM Big Data platform strategy
    • Enterprise class
    • Different BigInsights editions for varying needs
    • InfoSphere Streams

    Module 2 – An Introduction to InfoSphere BigInsights

    • InfoSphere BigInsights open source components
    • BigInsights: Value Beyond Open Source
    • BigInsights Content
    • What is Hadoop?
    • Open source programming
    • Open source control
    • Open source other
    • InfoSphere BigInsights IBM components
    • Web-based installation
    • A rich management big data tool
    • Running applications from the web console
    • BigInsights and text analytics
    • BigInsights text analytics development
    • BigSheets – spreadsheet-style analysis
    • GPFS-FSO
    • Performance enhancements

    Module 3 – Apache Hadoop and HDFS Overview

    • Why Hadoop?
    • How about technology?
    • How long it will take to read 1TB of data?
    • Parallel data processing is the answer!
    • What do we care about when we process data?
    • Why Hadoop when we have relational databases?
    • RDMS and Hadoop – complementary, not competing
    • Working with Hadoop
    • HDFS – Hadoop Distributed File System
    • Design principles of Hadoop
    • More details about HDFS
    • Hadoop system components overview
    • Working with HDFS
    • HDFS at a High Level
    • MapReduce
    • MapReduce programming abstraction overview
    • NameNode
    • NameNode directory structure
    • Secondary NameNode
    • DataNode
    • JobTracker and TaskTrackers
    • HDFS file blocks
    • Storing file blocks into HDFS from client machine
    • Rack Awareness
    • HDFS commands
    • HDFS file commands
    • Web Console data management
    • Web Console data view
    • Working with files and directories
    • Changing permissions
    • Hadoop shell command
    • Application status
    • Workflows tab
    • Application running status
    • BigSheets
    • BigSheets workbooks
    • Manipulation of data in BigSheets
    • Exercise introduction

    Module 4 – GPFS-FPO

    • GPFS-FPO: motivation
    • GPFS-FPO: architecture
    • Locality awareness
    • Allows applications to define own logical block size
    • Write Affinity: allow applications to dictate layout
    • Pipelined replication: efficient replication of data
    • Fast recovery
    • Hybrid allocation: treat metadata and data differently
    • Information lifecycle management (ILM)
    • Comparison with HDFS and MapR
    • BigInsights interface to GPFS-FPO
    • BigInsights interface to GPFS-FPO – URI access
    • GPFS cluster and file system concepts
    • Cluster topology – pool stanza file
    • Cluster topology – NSD stanza file
    • GPFS – FPO – file system for BigInsights

    Module 5 – BigInsights Web Console Security

    • Installation type
    • File system
    • Web Console security
    • Web Console roles
    • Assigning groups to roles
    • Flat file authentication
    • LDAP or PAM authentication
    • Web Console welcome

    Module 6 – Introduction to MapReduce Programming

    • MapReduce overview
    • MapReduce
    • SQL example of MapReduce
    • The Map function
    • Sort phase
    • The Reduce function
    • Combiner and Partition functions
    • Streaming and pipes
    • MapReduce example: wordcount
    • MapReduce co-locating with HDFS
    • MapReduce Processing
    • Speculative execution
    • MapReduce programming
    • MapReduce – a tale of two APIs
    • MapReduce Anatomy
    • Basic reduce code
    • MapReduce summary
    • MapReduce programming using BigInsights
    • Create a BigInsights project
    • Create a BigInsights program
    • Mapper class
    • Reducer and driver classes
    • Generated code
    • Exercise introduction

    Module 7 – Adaptive MapReduce

    • Emerging workload patterns
    • Adaptive MapReduce features
    • Workload and resource management architecture
    • Adaptive MapReduce architecture
    • Optimized shuffling
    • User interface for Adaptive MapReduce
    • Administrative tasks

    Module 8 – Setup, Configuration, and Administration of a Hadoop Cluster

    • Setup of Hadoop clusters
    • Starting points
    • What can be compressed in Hadoop?
    • Should I use compression with Hadoop?
    • Compression with BigInsights?
    • Enabling map output compression
    • Enabling job output compression
    • Working with SEQ files
    • Capacity calculations
    • Capacity planning
    • Disks and file system
    • Hardware considerations
    • Networking considerations
    • OS considerations
    • Configuration of Hadoop clusters
    • Configuration management
    • Configuration files
    • Preventing configuration property override
    • hadoop-env.sh settings
    • hdfs-site.xml settings
    • core-site.xml settings
    • mapred-site.xml configuration
    • Administration of Hadoop clusters with BigInsights
    • Setting rack topology (rack awareness)
    • Example of rack awareness script
    • ibm-hadoop.properties
    • Cluster status
    • Node administration
    • Balancer
    • Safemode at startup
    • Safemode commands
    • Dashboards
    • Exercise introduction

    Module 9 – Overview of Oozie

    • Oozie workflows
    • Action nodes
    • Effect of the MapReduce APIs
    • Control flows at a high level
    • Control flow nodes
    • Expression language functions
    • Workflow EL functions
    • Hadoop EL constants
    • HDFS EL functions
    • Workflow job
    • properties
    • Oozie Coordinator
    • Oozie coordinator system
    • Oozie components
    • Coordinator
    • Coordinator EL constants and function
    • Synchronous data sets
    • Coordinator application
    • Coordinator job
    • properties
    • Invoking Oozie
    • BigInsights workflow editor
    • BigInsights application publishing
    • Publishing an application
    • Deploy the application
    • Schedule the application
    • Link multiple applications
    • Link output to input
    • Deploy the linked application
    • Exercise introduction

    Module 10 – Managing Job Execution

    • FIFO scheduler
    • Job execution
    • Some Terminology
    • FIFO scheduler – first in first out (default)
    • Priorities in FIFO
    • Fair scheduler
    • FAIR scheduler
    • FAIR scheduler – pools allocation
    • FAIR scheduler – pools
    • FAIR scheduler – minimum share
    • FAIR scheduler – minimum share, no demand
    • FAIR scheduler – minimum share exceeds slots
    • FAIR scheduler – minimum share less than fair share
    • FAIR scheduler – weights
    • FAIR scheduler – weights example
    • Multiple jobs per pool
    • Configuring FAIR scheduler
    • Example of an allocation file
    • BigInsights Scheduler
    • InfoSphere BigInsights Scheduler priorities

    Module 11 – Moving Data into Hadoop

    • Loading scenarios
    • Load scenarios:
    • Data is at rest
    • Data in motion
    • Streaming data
    • Solution if data is from a data warehouse
    • Load solution using Flume
    • Data from a web server
    • Workings of Sqoop
    • Overview of Sqoop
    • Sqoop connection
    • Sqoop import
    • Sqoop import examples
    • Sqoop exports
    • Sqoop export examples
    • Additional export information
    • Workings of Flume
    • How Flume works?
    • Consolidation
    • Replicating and multiplexing
    • Configuration of Flume
    • Configuration example
    • Flume sources
    • Interceptors
    • Flume sinks
    • Flume channels
    • Flume channel selectors
    • Configuration details – components
    • Configuration details – properties
    • Configuration details – bindings
    • Flume example
    • Working with an agent
    • Exercise introduction

    Further Learning Roadmap:

    To learnanalytics further and orient your career in that direction, following are the recommended study areas –
    • Business Analytics, Business Intelligence
    • Text/Content Analytics
    • Big Data Programming
    • Predictive Analytics and Data Modelling


Course Fee:
USD 0

Course Type:

Self-Study

Course Status:

Active

Workload:

1 - 4 hours / week

Attended this course?

Back to Top

 
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url

Back to Top