Apache Spark and Scala Certification Training
Simplilearn Americas LLC
Course Summary
The course enables you to master the essential skills in Apache Spark & Scala such as Real-time processing, Spark SQL, Spark streaming, Machine learning programming, GraphX programming, and Shell scripting spark. It includes a real life industry-based project on movie reviews to be performed in Spark. The course is best suited for data scientists, IT developers, and analysts.
-
+
Course Description
What are the System Requirements?
Your system needs to fulfil the following requirements:64-bit Operating System
8GB RAM
How will the Labs be conducted?
We will help you to set up a Virtual Machine with local access. The detailed installation guide is provided in the Learning Management System.How will you do the projects and get certified?
Problem statements along with Data points are provided in the Learning Management System.
On the completion of the course, you have to submit the project which will be evaluated by the trainer. On successful evaluation of the project and completion of the online exam, you will get certified as Spark and Scala Professional.Who are the trainers?
Highly qualified and certified instructors with industry relevant experience deliver trainings.What are the modes of training offered for this course?
We offer this training in the following modes:
Live Virtual Classroom or Online Classroom: With instructor led online classroom training, you have the option to attend the course remotely from your desktop, laptop via video conferencing. This format saves productivity challenges and decreases your time spent away from work or home.
Online Self-Learning: In this mode, you will get the lecture videos and you can go through the course as per your comfort level.What if I miss a class?
We provide the recordings of the class after the session is conducted. So, if you miss a class then you can go through the recordings before the next session.Can I cancel my enrolment? Do I get a refund?
Yes, you can cancel your enrolment. We provide a complete refund after deducting the administration fee. To know more, please go through our refund policy.What are the payment options?
Payments can be made using any of the following options and a receipt of the same will be issued to you automatically via email.Visa Debit/credit Card
American Express and Diners Club Card
Master Card
PayPal
I want to know more about the training program. Whom do I contact?
Please join our Live Chat for instant support, call us, or Request a Call Back to have your query resolved.Who are our Faculties and how are they selected?
All our trainers are working professionals and industry experts with at least 10-12 years of relevant teaching experience.
Each of them have gone through a rigorous selection process which includes profile screening, technical evaluation, and training demo before they are certified to train for us.
We also ensure that only those trainers with a high alumni rating continue to train for us.What is Global Teaching Assistance?
Our teaching assistants are here to help you get certified in your first attempt.
They are a dedicated team of subject matter experts to help you at every step and enrich your learning experience from class onboarding to project mentoring and job assistance.
They engage with the students proactively to ensure the course path is followed.
Teaching Assistance is available during business hours.What is covered under the 24/7 Support promise?
We offer 24/7 support through email, chat, and calls.
We also have a dedicated team that provides on demand assistance through our community forum. What’s more, you will have lifetime access to the community forum, even after completion of your course with us.
-
+
Course Syllabus
Course preview
Apache Spark & Scala
Lesson 00 - Course Overview 04:12
0.1 Introduction 00:13
0.2 Course Objectives 00:28
0.3 Course Overview 00:38
0.4 Target Audience 00:31
0.5 Course Prerequisites 00:21
0.6 Value to the Professionals 00:48
0.7 Value to the Professionals (contd.) 00:20
0.8 Value to the Professionals (contd.) 00:21
0.9 Lessons Covered 00:24
0.10 Conclusion 00:08
Lesson 01 - Introduction to Spark 25:34
1.1 Introduction 00:15
1.2 Objectives 00:26
1.3 Evolution of Distributed Systems
1.4 Need of New Generation Distributed Systems 01:15
1.5 Limitations of MapReduce in Hadoop 01:06
1.6 Limitations of MapReduce in Hadoop (contd.) 01:07
1.7 Batch vs. Real-Time Processing 01:09
1.8 Application of Stream Processing 00:07
1.9 Application of In-Memory Processing 01:48
1.10 Introduction to Apache Spark 00:45
1.11 Components of a Spark Project
1.12 History of Spark 00:50
1.13 Language Flexibility in Spark 00:55
1.14 Spark Execution Architecture 01:13
1.15 Automatic Parallelization of Complex Flows 00:59
1.16 Automatic Parallelization of Complex Flows-Important Points 01:13
1.17 APIs That Match User Goals 01:06
1.18 Apache Spark-A Unified Platform of Big Data Apps 01:38
1.19 More Benefits of Apache Spark 01:05
1.20 Running Spark in Different Modes 00:41
1.21 Installing Spark as a Standalone Cluster-Configurations
1.22 Installing Spark as a Standalone Cluster-Configurations 00:08
1.23 Demo-Install Apache Spark 00:08
1.24 Demo-Install Apache Spark 02:41
1.25 Overview of Spark on a Cluster 00:47
1.26 Tasks of Spark on a Cluster 00:37
1.27 Companies Using Spark-Use Cases 00:46
1.28 Hadoop Ecosystem vs. Apache Spark 00:32
1.29 Hadoop Ecosystem vs. Apache Spark (contd.) 00:43
1.30 Quiz
1.31 Summary 00:40
1.32 Summary (contd.) 00:41
1.33 Conclusion 00:13
Lesson 02 - Introduction to Programming in Scala 37:35
2.1 Introduction 00:11
2.2 Objectives 00:16
2.3 Introduction to Scala 01:32
2.4 Features of Scala
2.5 Basic Data Types 00:24
2.6 Basic Literals 00:35
2.7 Basic Literals (contd.) 00:25
2.8 Basic Literals (contd.) 00:21
2.9 Introduction to Operators 00:31
2.10 Types of Operators
2.11 Use Basic Literals and the Arithmetic Operator 00:08
2.12 Demo Use Basic Literals and the Arithmetic Operator 03:18
2.13 Use the Logical Operator 00:07
2.14 Demo Use the Logical Operator 01:40
2.15 Introduction to Type Inference 00:34
2.16 Type Inference for Recursive Methods 00:10
2.17 Type Inference for Polymorphic Methods and Generic Classes 00:30
2.18 Unreliability on Type Inference Mechanism 00:23
2.19 Mutable Collection vs. Immutable Collection 01:13
2.20 Functions 00:21
2.21 Anonymous Functions 00:22
2.22 Objects 01:08
2.23 Classes 00:36
2.24 Use Type Inference, Functions, Anonymous Function, and Class 00:09
2.25 Demo Use Type Inference, Functions, Anonymous Function and Class 07:40
2.26 Traits as Interfaces 00:57
2.27 Traits-Example 00:09
2.28 Collections 00:42
2.29 Types of Collections 00:25
2.30 Types of Collections (contd.) 00:26
2.31 Lists 00:28
2.32 Perform Operations on Lists 00:07
2.33 Demo Use Data Structures 04:10
2.34 Maps 00:46
2.35 Maps-Operations
2.36 Pattern Matching 00:33
2.37 Implicits 00:37
2.38 Implicits (contd.) 00:18
2.39 Streams 00:22
2.40 Use Data Structures 00:07
2.41 Demo Perform Operations on Lists 03:25
2.42 Quiz
2.43 Summary 00:37
2.44 Summary (contd.) 00:37
2.45 Conclusion 00:15
Lesson 03 - Using RDD for Creating Applications in Spark 51:02
3.1 Introduction 00:12
3.2 Objectives 00:23
3.3 RDDs API 01:40
3.4 Features of RDDs
3.5 Creating RDDs 00:36
3.6 Creating RDDs—Referencing an External Dataset 00:19
3.7 Referencing an External Dataset—Text Files 00:51
3.8 Referencing an External Dataset—Text Files (contd.) 00:50
3.9 Referencing an External Dataset—Sequence Files 00:33
3.10 Referencing an External Dataset—Other Hadoop Input Formats 00:46
3.11 Creating RDDs—Important Points 01:09
3.12 RDD Operations 00:38
3.13 RDD Operations—Transformations 00:47
3.14 Features of RDD Persistence 00:57
3.15 Storage Levels Of RDD Persistence 00:20
3.16 Choosing The Correct RDD Persistence Storage Level
3.17 Invoking the Spark Shell 00:23
3.18 Importing Spark Classes 00:14
3.19 Creating the SparkContext 00:26
3.20 Loading a File in Shell 00:11
3.21 Performing Some Basic Operations on Files in Spark Shell RDDs 00:20
3.22 Packaging a Spark Project with SBT 00:50
3.23 Running a Spark Project With SBT 00:32
3.24 Demo-Build a Scala Project 00:07
3.25 Build a Scala Project 06:51
3.26 Demo-Build a Spark Java Project 00:08
3.27 Build a Spark Java Project 04:31
3.28 Shared Variables—Broadcast 01:21
3.29 Shared Variables—Accumulators 00:52
3.30 Writing a Scala Application 00:20
3.31 Demo-Run a Scala Application 00:07
3.32 Run a Scala Application 01:43
3.33 Demo-Write a Scala Application Reading the Hadoop Data 00:07
3.34 Write a Scala Application Reading the Hadoop Data 01:23
3.35 Demo-Run a Scala Application Reading the Hadoop Data 00:08
3.36 Run a Scala Application Reading the Hadoop Data 02:21
3.37 Scala RDD Extensions
3.38 DoubleRDD Methods 00:08
3.39 PairRDD Methods—Join 00:47
3.40 PairRDD Methods—Others 00:06
3.41 Java PairRDD Methods 00:09
3.42 Java PairRDD Methods (contd.) 00:06
3.43 General RDD Methods 00:06
3.44 General RDD Methods (contd.) 00:05
3.45 Java RDD Methods 00:08
3.46 Java RDD Methods (contd.) 00:06
3.47 Common Java RDD Methods 00:10
3.48 Spark Java Function Classes 00:13
3.49 Method for Combining JavaPairRDD Functions 00:42
3.50 Transformations in RDD 00:34
3.51 Other Methods 00:07
3.52 Actions in RDD 00:08
3.53 Key-Value Pair RDD in Scala 00:32
3.54 Key-Value Pair RDD in Java 00:43
3.55 Using MapReduce and Pair RDD Operations 00:25
3.56 Reading Text File from HDFS 00:16
3.57 Reading Sequence File from HDFS 00:21
3.58 Writing Text Data to HDFS 00:18
3.59 Writing Sequence File to HDFS 00:12
3.60 Using GroupBy 00:07
3.61 Using GroupBy (contd.) 00:05
3.62 Demo-Run a Scala Application Performing GroupBy Operation 00:08
3.63 Run a Scala Application Performing GroupBy Operation 03:13
3.64 Demo-Run a Scala Application Using the Scala Shell 00:07
3.65 Run a Scala Application Using the Scala Shell 04:02
3.66 Demo-Write and Run a Java Application 00:06
3.67 Write and Run a Java Application 01:49
3.68 Quiz
3.69 Summary 00:53
3.70 Summary (contd.) 00:59
3.71 Conclusion 00:15
Lesson 04 - Running SQL Queries Using Spark SQL 30:24
4.1 Introduction 00:12
4.2 Objectives 00:17
4.3 Importance of Spark SQL 01:02
4.4 Benefits of Spark SQL 00:47
4.5 DataFrames 00:50
4.6 SQLContext 00:50
4.7 SQLContext (contd.) 01:13
4.8 Creating a DataFrame 00:11
4.9 Using DataFrame Operations 00:22
4.10 Using DataFrame Operations (contd.) 00:05
4.11 Demo-Run SparkSQL with a Dataframe 00:06
4.12 Run SparkSQL with a Dataframe 08:53
4.13 Interoperating with RDDs
4.14 Using the Reflection-Based Approach 00:38
4.15 Using the Reflection-Based Approach (contd.) 00:08
4.16 Using the Programmatic Approach 00:44
4.17 Using the Programmatic Approach (contd.) 00:07
4.18 Demo-Run Spark SQL Programmatically 00:08
4.19 Run Spark SQL Programmatically 00:01
4.20 Data Sources
4.21 Save Modes 00:32
4.22 Saving to Persistent Tables 00:46
4.23 Parquet Files 00:19
4.24 Partition Discovery 00:38
4.25 Schema Merging 00:29
4.26 JSON Data 00:34
4.27 Hive Table 00:45
4.28 DML Operation-Hive Queries 00:27
4.29 Demo-Run Hive Queries Using Spark SQL 00:07
4.30 Run Hive Queries Using Spark SQL 04:58
4.31 JDBC to Other Databases 00:49
4.32 Supported Hive Features 00:38
4.33 Supported Hive Features (contd.) 00:22
4.34 Supported Hive Data Types 00:13
4.35 Case Classes 00:15
4.36 Case Classes (contd.) 00:07
4.37 Quiz
4.38 Summary 00:49
4.39 Summary (contd.) 00:49
4.40 Conclusion 00:13
Lesson 05 - Spark Streaming 35:09
5.1 Introduction 00:11
5.2 Objectives 00:15
5.3 Introduction to Spark Streaming 00:50
5.4 Working of Spark Streaming 00:20
5.5 Features of Spark Streaming
5.6 Streaming Word Count 01:34
5.7 Micro Batch 00:19
5.8 DStreams 00:34
5.9 DStreams (contd.) 00:39
5.10 Input DStreams and Receivers 01:19
5.11 Input DStreams and Receivers (contd.) 00:55
5.12 Basic Sources 01:14
5.13 Advanced Sources 00:49
5.14 Advanced Sources-Twitter
5.15 Transformations on DStreams 00:15
5.16 Transformations on Dstreams (contd.) 00:06
5.17 Output Operations on DStreams 00:29
5.18 Design Patterns for Using ForeachRDD 01:15
5.19 DataFrame and SQL Operations 00:26
5.20 DataFrame and SQL Operations (contd.) 00:20
5.21 Checkpointing 01:25
5.22 Enabling Checkpointing 00:39
5.23 Socket Stream 01:00
5.24 File Stream 00:12
5.25 Stateful Operations 00:28
5.26 Window Operations 01:22
5.27 Types of Window Operations 00:12
5.28 Types of Window Operations Types (contd.) 00:06
5.29 Join Operations-Stream-Dataset Joins 00:21
5.30 Join Operations-Stream-Stream Joins 00:34
5.31 Monitoring Spark Streaming Application 01:19
5.32 Performance Tuning-High Level 00:20
5.33 Performance Tuning-Detail Level
5.34 Demo-Capture and Process the Netcat Data 00:07
5.35 Capture and Process the Netcat Data 05:01
5.36 Demo-Capture and Process the Flume Data 00:08
5.37 Capture and Process the Flume Data 05:08
5.38 Demo-Capture the Twitter Data 00:07
5.39 Capture the Twitter Data 02:33
5.40 Quiz
5.41 Summary 01:01
5.42 Summary (contd.) 01:04
5.43 Conclusion 00:12
Lesson 06 - Spark ML Programming 40:08
6.1 Introduction 00:12
6.2 Objectives 00:20
6.3 Introduction to Machine Learning 01:36
6.4 Common Terminologies in Machine Learning
6.5 Applications of Machine Learning 00:22
6.6 Machine Learning in Spark 00:34
6.7 Spark ML API
6.8 DataFrames 00:32
6.9 Transformers and Estimators 00:59
6.10 Pipeline 00:48
6.11 Working of a Pipeline 01:41
6.12 Working of a Pipeline (contd.) 00:45
6.13 DAG Pipelines 00:33
6.14 Runtime Checking 00:21
6.15 Parameter Passing 01:00
6.16 General Machine Learning Pipeline-Example 00:05
6.17 General Machine Learning Pipeline-Example (contd.)
6.18 Model Selection via Cross-Validation 01:16
6.19 Supported Types, Algorithms, and Utilities 00:31
6.20 Data Types 01:26
6.21 Feature Extraction and Basic Statistics 00:43
6.22 Clustering 00:38
6.23 K-Means 00:55
6.24 K-Means (contd.) 00:05
6.25 Demo-Perform Clustering Using K-Means 00:07
6.26 Perform Clustering Using K-Means 04:41
6.27 Gaussian Mixture 00:57
6.28 Power Iteration Clustering (PIC) 01:17
6.29 Latent Dirichlet Allocation (LDA) 00:35
6.30 Latent Dirichlet Allocation (LDA) (contd.) 01:45
6.31 Collaborative Filtering 01:13
6.32 Classification 00:16
6.33 Classification (contd.) 00:06
6.34 Regression 00:42
6.35 Example of Regression 00:56
6.36 Demo-Perform Classification Using Linear Regression 00:08
6.37 Perform Classification Using Linear Regression 02:01
6.38 Demo-Run Linear Regression 00:06
6.39 Run Linear Regression 02:14
6.40 Demo-Perform Recommendation Using Collaborative Filtering 00:05
6.41 Perform Recommendation Using Collaborative Filtering 02:23
6.42 Demo-Run Recommendation System 00:06
6.43 Run Recommendation System 02:45
6.44 Quiz
6.45 Summary 01:14
6.46 Summary (contd.) 00:57
6.47 Conclusion 00:12
Lesson 07 - Spark GraphX Programming 46:26
7.001 Introduction 00:14
7.002 Objectives 00:17
7.003 Introduction to Graph-Parallel System 01:14
7.004 Limitations of Graph-Parallel System 00:49
7.005 Introduction to GraphX 01:21
7.006 Introduction to GraphX (contd.) 00:06
7.007 Importing GraphX 00:10
7.008 The Property Graph 01:25
7.009 The Property Graph (contd.) 00:07
7.010 Features of the Property Graph
7.011 Creating a Graph 00:14
7.012 Demo-Create a Graph Using GraphX 00:07
7.013 Create a Graph Using GraphX 10:08
7.014 Triplet View 00:30
7.015 Graph Operators 00:51
7.016 List of Operators 00:23
7.017 List of Operators (contd.) 00:05
7.018 Property Operators 00:18
7.019 Structural Operators 01:02
7.020 Subgraphs 00:21
7.021 Join Operators 01:09
7.022 Demo-Perform Graph Operations Using GraphX 00:07
7.023 Perform Graph Operations Using GraphX 05:46
7.024 Demo-Perform Subgraph Operations 00:07
7.025 Perform Subgraph Operations 01:37
7.026 Neighborhood Aggregation 00:43
7.027 mapReduceTriplets 00:42
7.028 Demo-Perform MapReduce Operations 00:08
7.029 Perform MapReduce Operations 09:18
7.030 Counting Degree of Vertex 00:32
7.031 Collecting Neighbors 00:28
7.032 Caching and Uncaching 01:10
7.033 Graph Builders
7.034 Vertex and Edge RDDs 01:17
7.035 Graph System Optimizations 01:22
7.036 Built-in Algorithms
7.037 Quiz
7.038 Summary 01:12
7.039 Summary (contd.) 00:55
7.040 Conclusion 00:11
That was just a sneak-peak into the lesson.
Enroll for this course and get full access.
Enroll now