Apache Spark, Scala, Storm Training

Intellipaat

Course Summary

Our Apache Storm, Spark, Scala certification master program lets you gain proficiency in real-time data analytics and high speed processing. You will work on real world projects in Spark RDD, Scala programming, Storm topology, Logic Dynamics, Trident Filters, Spouts.

+
Course Description
About Course

Be the expert in Big Data processing by learning the conceptual implementation of Apache Storm and Apache Spark using Scala ProgrammingThis is a Combo Course in Spark,Scala,Storm that is designed keeping in mind the industry requirements for high-speed processing of data. Taking this Training will fully equip you to take on the challenges in the Big Data Hadoop ecosystem in the real world regardless of industry vertical.This Training Course includes learning the Apache Spark processing engine along with programming in the general-purpose language Scala as well as provide in depth knowledge of the Apache Storm computation system.
What you will learn in this Training Course?
1. Understanding Spark and programming in Scala
2. Comparison between Spark and Hadoop
3. Deploying high-speed processing on Big Data
4. Cluster deployment of Apache Spark
5. Deploying Python, Java and Scala applications in Apache Spark
6. Learn concepts of distributed processing and Storm Architecture
7. Storm Topology, Logic Dynamics, and Components
8. Learn about Trident Filter, Spouts and Functions
9. Using Storm for real-time analytics
10. Types of analysis including batch analysis
Who should take this Training Course?
- Big Data professionals ,Data Scientists and Software Engineers
- ETL Developers and Data Analysts and Project Managers
- Those looking for a Big Data career
What are the prerequisites for taking this Training Course?
Anybody can take this Training Course regardless of their skills. A basic knowledge of Java can help.
Why should you take this Training Course?
The amount of Big Data that is processed today points to the fact that there is an urgent need for faster and more efficient way of processing data. Learning Spark and Storm puts you at an advantage since there is a huge demand for professionals in this domain. Learning Scala which is the language of choice for writing Spark applications is also hugely beneficial. All in all this Combo Course can help you grab some of the best jobs in the industry.

Course Description

About Course

Be the expert in Big Data processing by learning the conceptual implementation of Apache Storm and Apache Spark using Scala ProgrammingThis is a Combo Course in Spark,Scala,Storm that is designed keeping in mind the industry requirements for high-speed processing of data. Taking this Training will fully equip you to take on the challenges in the Big Data Hadoop ecosystem in the real world regardless of industry vertical.This Training Course includes learning the Apache Spark processing engine along with programming in the general-purpose language Scala as well as provide in depth knowledge of the Apache Storm computation system.

What you will learn in this Training Course?

Understanding Spark and programming in Scala
Comparison between Spark and Hadoop
Deploying high-speed processing on Big Data
Cluster deployment of Apache Spark
Deploying Python, Java and Scala applications in Apache Spark
Learn concepts of distributed processing and Storm Architecture
Storm Topology, Logic Dynamics, and Components
Learn about Trident Filter, Spouts and Functions
Using Storm for real-time analytics
Types of analysis including batch analysis

Who should take this Training Course?

Big Data professionals ,Data Scientists and Software Engineers
ETL Developers and Data Analysts and Project Managers
Those looking for a Big Data career

What are the prerequisites for taking this Training Course?

Anybody can take this Training Course regardless of their skills. A basic knowledge of Java can help.

Why should you take this Training Course?

The amount of Big Data that is processed today points to the fact that there is an urgent need for faster and more efficient way of processing data. Learning Spark and Storm puts you at an advantage since there is a huge demand for professionals in this domain. Learning Scala which is the language of choice for writing Spark applications is also hugely beneficial. All in all this Combo Course can help you grab some of the best jobs in the industry.

+
Course Syllabus

Scala Course Content
Introduction of Scala
Introducing Scala and deployment of Scala for Big Data applications and Apache Spark analytics.
Pattern Matching
The importance of Scala, the concept of REPL (Read Evaluate Print Loop), deep dive into Scala pattern matching, type interface, higher order function, currying, traits, application space and Scala for data analysis.
Executing the Scala code
Learning about the Scala Interpreter, static object timer in Scala, testing String equality in Scala, Implicit classes in Scala, the concept of currying in Scala, various classes in Scala.
Classes concept in Scala
Learning about the Classes concept, understanding the constructor overloading, the various abstract classes, the hierarchy types in Scala, the concept of object equality, the val and var methods in Scala.
Case classes and pattern matching
Understanding Sealed traits, wild, constructor, tuple, variable pattern, and constant pattern.
Concepts of traits with example
Understanding traits in Scala, the advantages of traits, linearization of traits, the Java equivalent and avoiding of boilerplate code.
Scala java Interoperability
Implementation of traits in Scala and Java, handling of multiple traits extending.
Scala collections
Introduction to Scala collections, classification of collections, the difference between Iterator, and Iterable in Scala, example of list sequence in Scala.
Mutable collections vs. Immutable collections
The two types of collections in Scala, Mutable and Immutable collections, understanding lists and arrays in Scala, the list buffer and array buffer, Queue in Scala, double-ended queue Deque, Stacks, Sets, Maps, Tuples in Scala.
Use Case bobsrockets package
Introduction to Scala packages and imports, the selective imports, the Scala test classes, introduction to JUnit test class, JUnit interface via JUnit 3 suite for Scala test, packaging of Scala applications in Directory Structure, example of Spark Split and Spark Scala.
Spark Course Content
Introduction to Spark
Introduction to Spark, how Spark overcomes the drawbacks of working MapReduce, understanding in-memory MapReduce,interactive operations on MapReduce, Spark stack, fine vs. coarse grained update, Spark stack,Spark Hadoop YARN, HDFS Revision, YARN Revision, the overview of Spark and how it is better Hadoop, deploying Spark without Hadoop,Spark history server, Cloudera distribution.
Spark Basics
Spark installation guide,Spark configuration, memory management, executor memory vs. driver memory, working with Spark Shell, the concept of Resilient Distributed Datasets (RDD), learning to do functional programming in Spark, the architecture of Spark.
Working with RDDs in Spark
Spark RDD, creating RDDs, RDD partitioning, operations & transformation in RDD,Deep dive into Spark RDDs, the RDD general operations, a read-only partitioned collection of records, using the concept of RDD for faster and efficient data processing,RDD action for Collect, Count, Collectsmap, Saveastextfiles, pair RDD functions.
Aggregating Data with Pair RDDs
Understanding the concept of Key-Value pair in RDDs, learning how Spark makes MapReduce operations faster, various operations of RDD,MapReduce interactive operations, fine & coarse grained update, Spark stack.
Writing and Deploying Spark Applications
Comparing the Spark applications with Spark Shell, creating a Spark application using Scala or Java, deploying a Spark application,Scala built application,creation of mutable list, set & set operations, list, tuple, concatenating list, creating application using SBT,deploying application using Maven,the web user interface of Spark application, a real world example of Spark and configuring of Spark.
Parallel Processing
Learning about Spark parallel processing, deploying on a cluster, introduction to Spark partitions, file-based partitioning of RDDs, understanding of HDFS and data locality, mastering the technique of parallel operations,comparing repartition & coalesce, RDD actions.
Spark RDD Persistence
The execution flow in Spark, Understanding the RDD persistence overview,Spark execution flow & Spark terminology, distribution shared memory vs. RDD, RDD limitations, Spark shell arguments,distributed persistence, RDD lineage,Key/Value pair for sorting implicit conversion like CountByKey, ReduceByKey, SortByKey, AggregataeByKey
Spark Streaming & Mlib
Spark Streaming Architecture, Writing streaming programcoding, processing of spark stream,processing Spark Discretized Stream (DStream), the context of Spark Streaming, streaming transformation, Flume Spark streaming, request count and Dstream, multi batch operation, sliding window operations and advanced data sources. Different Algorithms, the concept of iterative algorithm in Spark, analyzing with Spark graph processing, introduction to K-Means and machine learning, various variables in Spark like shared variables, broadcast variables, learning about accumulators.
Improving Spark Performance
Introduction to various variables in Spark like shared variables, broadcast variables, learning about accumulators, the common performance issues and troubleshooting the performance problems.
Spark SQL and Data Frames
Learning about Spark SQL, the context of SQL in Spark for providing structured data processing, JSON support in Spark SQL, working with XML data, parquet files, creating HiveContext, writing Data Frame to Hive, reading JDBC files, understanding the Data Frames in Spark, creating Data Frames, manual inferring of schema, working with CSV files, reading JDBC tables, Data Frame to JDBC, user defined functions in Spark SQL, shared variable and accumulators, learning to query and transform data in Data Frames, how Data Frame provides the benefit of both Spark RDD and Spark SQL, deploying Hive on Spark as the execution engine.
Scheduling/ Partitioning
Learning about the scheduling and partitioning in Spark,hash partition, range partition, scheduling within and around applications, static partitioning, dynamic sharing, fair scheduling,Map partition with index, the Zip, GroupByKey, Spark master high availability, standby Masters with Zookeeper, Single Node Recovery With Local File System, High Order Functions.
Apache Storm Course Content
Understanding Architecture of Storm
Big Data characteristics, understanding Hadoop distributed computing, the Bayesian Law, deploying Storm for real time analytics, the Apache Storm features, comparing Storm with Hadoop, Storm execution, learning about Tuple, Spout, Bolt.
Installation of Apache storm
Installing the Apache Storm, various types of run modes of Storm.
Introduction to Apache Storm
Understanding Apache Storm and the data model.
Apache Kafka Installation
Installation of Apache Kakfa and its configuration.
Apache Storm Advanced
Understanding of advanced Storm topics like Spouts, Bolts, Stream Groupings, Topology and its Life cycle, learning about Guaranteed Message Processing.
Storm Topology
Various Grouping types in Storm, reliable and unreliable messages, Bolt structure and life cycle, understanding Trident topology for failure handling, process, Call Log Analysis Topology for analyzing call logs for calls made from one number to another.
Overview of Trident
Understanding of Trident Spouts and its different types, the various Trident Spout interface and components, familiarizing with Trident Filter, Aggregator and Functions, a practical and hands-on use case on solving call log problem using Storm Trident.
Storm Components & classes
Various components, classes and interfaces in storm like â€“ Base Rich Bolt Class, i RichBolt Interface, i RichSpout Interface, Base Rich Spout class and the various methodology of working with them.
Cassandra Introduction
Understanding Cassandra, its core concepts, its strengths and deployment.
Boot Stripping
Twitter Boot Stripping, detailed understanding of Boot Stripping, concepts of Storm, Storm Development Environment.
Apache Spark â€“ Scala Project
Project 1: Movie RecommendationTopics â€“ This is a project wherein you will gain hands-on experience in deploying Apache Spark for movie recommendation. You will be introduced to the Spark Machine Learning Library, a guide to MLlib algorithms and coding which is a machine learning library. Understand how to deploy collaborative filtering, clustering, regression, and dimensionality reduction in MLlib. Upon completion of the project you will gain experience in working with streaming data, sampling, testing and statistics.Project 2: Twitter API Integration for tweet AnalysisTopics â€“ With this project you will learn to integrate Twitter API for analyzing tweets. You will write codes on the server side using any of the scripting languages like PHP, Ruby or Python, for requesting the Twitter API and get the results in JSON format. You will then read the results and perform various operations like aggregation, filtering and parsing as per the need to come up with tweet analysis.Project 3: Data Exploration Using Spark SQL â€“ Wikipedia data setTopics â€“ This project lets you work with Spark SQL. You will gain experience in working with Spark SQL for combining it with ETL applications, real time analysis of data, performing batch analysis, deploying machine learning, creating visualizations and processing of graphs.
Apache Storm Project
Project 1. Call Log Analysis using TridentTopics : In this project you will be working on call logs to decipher the data and gather valuable insights using Apache Storm Trident. You will extensively work with data about calls made from one number to another. The aim of this project is to resolve the call log issues with Trident stream processing and low latency distributed querying. You will gain hands-on experience in working with Spouts and Bolts along with various Trident functions, filters, aggregation, joins and grouping.Project 2. Twitter Data Analysis using TridentTopics : This is a project that involves working with Twitter data and processing it to extract patterns out of it. The Apache Storm Trident is the perfect framework for real-time analysis of tweets. Working with Trident you will be able to simplify the task of live Twitter feed analysis. In this project you will gain real world experience of working with Spouts, Bolts, and Trident filters, joins, aggregation, functions and grouping.Project 3. US Presidential Election Result analysis using Trident DRPC QueryTopics : This is a project that lets you work on the US presidential election results and predict who is leading and trailing on a real-time basis. For this you exclusively work with Trident distributed Remote Procedure Call server. After completion of the project you will learn how to access data residing in a remote computer or network and deploy it for real-time processing, analysis and prediction.

Course Fee:

USD 230

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

This course is listed under Open Source , Development & Implementations , Data & Information Management , Networks & IT Infrastructure and Server & Storage Management Community

SQL

Apache Spark

Hadoop

Big Data

Java

Interface

Attended this course? Write a Review

Course Fee:

USD 230

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

Course Summary

Course Description

About Course

What you will learn in this Training Course?

Who should take this Training Course?

What are the prerequisites for taking this Training Course?

Why should you take this Training Course?

Course Description

About Course

What you will learn in this Training Course?

Who should take this Training Course?

What are the prerequisites for taking this Training Course?

Why should you take this Training Course?

Course Syllabus

Scala Course Content

Spark Course Content

Apache Storm Course Content

Course Type:

Course Status:

Workload:

Scala

Apache

Analysis

Apache Storm

SQL

Apache Spark

Hadoop

Big Data

Java

Interface

Attended this course? Write a Review

Course Type:

Course Status:

Workload: