MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.


I'm not curious

Apache Spark and Scala Certification Training

Course Summary

The course enables you to master the essential skills in Apache Spark & Scala such as Real-time processing, Spark SQL, Spark streaming, Machine learning programming, GraphX programming, and Shell scripting spark. It includes a real life industry-based project on movie reviews to be performed in Spark. The course is best suited for data scientists, IT developers, and analysts.


  • +

    Course Syllabus


    Course preview

    Apache Spark & Scala

    Lesson 00 - Course Overview 04:12

    0.1 Introduction 00:13

    0.2 Course Objectives 00:28

    0.3 Course Overview 00:38

    0.4 Target Audience 00:31

    0.5 Course Prerequisites 00:21

    0.6 Value to the Professionals 00:48

    0.7 Value to the Professionals (contd.) 00:20

    0.8 Value to the Professionals (contd.) 00:21

    0.9 Lessons Covered 00:24

    0.10 Conclusion 00:08

    Lesson 01 - Introduction to Spark 25:34

    1.1 Introduction 00:15

    1.2 Objectives 00:26

    1.3 Evolution of Distributed Systems

    1.4 Need of New Generation Distributed Systems 01:15

    1.5 Limitations of MapReduce in Hadoop 01:06

    1.6 Limitations of MapReduce in Hadoop (contd.) 01:07

    1.7 Batch vs. Real-Time Processing 01:09

    1.8 Application of Stream Processing 00:07

    1.9 Application of In-Memory Processing 01:48

    1.10 Introduction to Apache Spark 00:45

    1.11 Components of a Spark Project

    1.12 History of Spark 00:50

    1.13 Language Flexibility in Spark 00:55

    1.14 Spark Execution Architecture 01:13

    1.15 Automatic Parallelization of Complex Flows 00:59

    1.16 Automatic Parallelization of Complex Flows-Important Points 01:13

    1.17 APIs That Match User Goals 01:06

    1.18 Apache Spark-A Unified Platform of Big Data Apps 01:38

    1.19 More Benefits of Apache Spark 01:05

    1.20 Running Spark in Different Modes 00:41

    1.21 Installing Spark as a Standalone Cluster-Configurations

    1.22 Installing Spark as a Standalone Cluster-Configurations 00:08

    1.23 Demo-Install Apache Spark 00:08

    1.24 Demo-Install Apache Spark 02:41

    1.25 Overview of Spark on a Cluster 00:47

    1.26 Tasks of Spark on a Cluster 00:37

    1.27 Companies Using Spark-Use Cases 00:46

    1.28 Hadoop Ecosystem vs. Apache Spark 00:32

    1.29 Hadoop Ecosystem vs. Apache Spark (contd.) 00:43

    1.30 Quiz

    1.31 Summary 00:40

    1.32 Summary (contd.) 00:41

    1.33 Conclusion 00:13

    Lesson 02 - Introduction to Programming in Scala 37:35

    2.1 Introduction 00:11

    2.2 Objectives 00:16

    2.3 Introduction to Scala 01:32

    2.4 Features of Scala

    2.5 Basic Data Types 00:24

    2.6 Basic Literals 00:35

    2.7 Basic Literals (contd.) 00:25

    2.8 Basic Literals (contd.) 00:21

    2.9 Introduction to Operators 00:31

    2.10 Types of Operators

    2.11 Use Basic Literals and the Arithmetic Operator 00:08

    2.12 Demo Use Basic Literals and the Arithmetic Operator 03:18

    2.13 Use the Logical Operator 00:07

    2.14 Demo Use the Logical Operator 01:40

    2.15 Introduction to Type Inference 00:34

    2.16 Type Inference for Recursive Methods 00:10

    2.17 Type Inference for Polymorphic Methods and Generic Classes 00:30

    2.18 Unreliability on Type Inference Mechanism 00:23

    2.19 Mutable Collection vs. Immutable Collection 01:13

    2.20 Functions 00:21

    2.21 Anonymous Functions 00:22

    2.22 Objects 01:08

    2.23 Classes 00:36

    2.24 Use Type Inference, Functions, Anonymous Function, and Class 00:09

    2.25 Demo Use Type Inference, Functions, Anonymous Function and Class 07:40

    2.26 Traits as Interfaces 00:57

    2.27 Traits-Example 00:09

    2.28 Collections 00:42

    2.29 Types of Collections 00:25

    2.30 Types of Collections (contd.) 00:26

    2.31 Lists 00:28

    2.32 Perform Operations on Lists 00:07

    2.33 Demo Use Data Structures 04:10

    2.34 Maps 00:46

    2.35 Maps-Operations

    2.36 Pattern Matching 00:33

    2.37 Implicits 00:37

    2.38 Implicits (contd.) 00:18

    2.39 Streams 00:22

    2.40 Use Data Structures 00:07

    2.41 Demo Perform Operations on Lists 03:25

    2.42 Quiz

    2.43 Summary 00:37

    2.44 Summary (contd.) 00:37

    2.45 Conclusion 00:15

    Lesson 03 - Using RDD for Creating Applications in Spark 51:02

    3.1 Introduction 00:12

    3.2 Objectives 00:23

    3.3 RDDs API 01:40

    3.4 Features of RDDs

    3.5 Creating RDDs 00:36

    3.6 Creating RDDs—Referencing an External Dataset 00:19

    3.7 Referencing an External Dataset—Text Files 00:51

    3.8 Referencing an External Dataset—Text Files (contd.) 00:50

    3.9 Referencing an External Dataset—Sequence Files 00:33

    3.10 Referencing an External Dataset—Other Hadoop Input Formats 00:46

    3.11 Creating RDDs—Important Points 01:09

    3.12 RDD Operations 00:38

    3.13 RDD Operations—Transformations 00:47

    3.14 Features of RDD Persistence 00:57

    3.15 Storage Levels Of RDD Persistence 00:20

    3.16 Choosing The Correct RDD Persistence Storage Level

    3.17 Invoking the Spark Shell 00:23

    3.18 Importing Spark Classes 00:14

    3.19 Creating the SparkContext 00:26

    3.20 Loading a File in Shell 00:11

    3.21 Performing Some Basic Operations on Files in Spark Shell RDDs 00:20

    3.22 Packaging a Spark Project with SBT 00:50

    3.23 Running a Spark Project With SBT 00:32

    3.24 Demo-Build a Scala Project 00:07

    3.25 Build a Scala Project 06:51

    3.26 Demo-Build a Spark Java Project 00:08

    3.27 Build a Spark Java Project 04:31

    3.28 Shared Variables—Broadcast 01:21

    3.29 Shared Variables—Accumulators 00:52

    3.30 Writing a Scala Application 00:20

    3.31 Demo-Run a Scala Application 00:07

    3.32 Run a Scala Application 01:43

    3.33 Demo-Write a Scala Application Reading the Hadoop Data 00:07

    3.34 Write a Scala Application Reading the Hadoop Data 01:23

    3.35 Demo-Run a Scala Application Reading the Hadoop Data 00:08

    3.36 Run a Scala Application Reading the Hadoop Data 02:21

    3.37 Scala RDD Extensions

    3.38 DoubleRDD Methods 00:08

    3.39 PairRDD Methods—Join 00:47

    3.40 PairRDD Methods—Others 00:06

    3.41 Java PairRDD Methods 00:09

    3.42 Java PairRDD Methods (contd.) 00:06

    3.43 General RDD Methods 00:06

    3.44 General RDD Methods (contd.) 00:05

    3.45 Java RDD Methods 00:08

    3.46 Java RDD Methods (contd.) 00:06

    3.47 Common Java RDD Methods 00:10

    3.48 Spark Java Function Classes 00:13

    3.49 Method for Combining JavaPairRDD Functions 00:42

    3.50 Transformations in RDD 00:34

    3.51 Other Methods 00:07

    3.52 Actions in RDD 00:08

    3.53 Key-Value Pair RDD in Scala 00:32

    3.54 Key-Value Pair RDD in Java 00:43

    3.55 Using MapReduce and Pair RDD Operations 00:25

    3.56 Reading Text File from HDFS 00:16

    3.57 Reading Sequence File from HDFS 00:21

    3.58 Writing Text Data to HDFS 00:18

    3.59 Writing Sequence File to HDFS 00:12

    3.60 Using GroupBy 00:07

    3.61 Using GroupBy (contd.) 00:05

    3.62 Demo-Run a Scala Application Performing GroupBy Operation 00:08

    3.63 Run a Scala Application Performing GroupBy Operation 03:13

    3.64 Demo-Run a Scala Application Using the Scala Shell 00:07

    3.65 Run a Scala Application Using the Scala Shell 04:02

    3.66 Demo-Write and Run a Java Application 00:06

    3.67 Write and Run a Java Application 01:49

    3.68 Quiz

    3.69 Summary 00:53

    3.70 Summary (contd.) 00:59

    3.71 Conclusion 00:15

    Lesson 04 - Running SQL Queries Using Spark SQL 30:24

    4.1 Introduction 00:12

    4.2 Objectives 00:17

    4.3 Importance of Spark SQL 01:02

    4.4 Benefits of Spark SQL 00:47

    4.5 DataFrames 00:50

    4.6 SQLContext 00:50

    4.7 SQLContext (contd.) 01:13

    4.8 Creating a DataFrame 00:11

    4.9 Using DataFrame Operations 00:22

    4.10 Using DataFrame Operations (contd.) 00:05

    4.11 Demo-Run SparkSQL with a Dataframe 00:06

    4.12 Run SparkSQL with a Dataframe 08:53

    4.13 Interoperating with RDDs

    4.14 Using the Reflection-Based Approach 00:38

    4.15 Using the Reflection-Based Approach (contd.) 00:08

    4.16 Using the Programmatic Approach 00:44

    4.17 Using the Programmatic Approach (contd.) 00:07

    4.18 Demo-Run Spark SQL Programmatically 00:08

    4.19 Run Spark SQL Programmatically 00:01

    4.20 Data Sources

    4.21 Save Modes 00:32

    4.22 Saving to Persistent Tables 00:46

    4.23 Parquet Files 00:19

    4.24 Partition Discovery 00:38

    4.25 Schema Merging 00:29

    4.26 JSON Data 00:34

    4.27 Hive Table 00:45

    4.28 DML Operation-Hive Queries 00:27

    4.29 Demo-Run Hive Queries Using Spark SQL 00:07

    4.30 Run Hive Queries Using Spark SQL 04:58

    4.31 JDBC to Other Databases 00:49

    4.32 Supported Hive Features 00:38

    4.33 Supported Hive Features (contd.) 00:22

    4.34 Supported Hive Data Types 00:13

    4.35 Case Classes 00:15

    4.36 Case Classes (contd.) 00:07

    4.37 Quiz

    4.38 Summary 00:49

    4.39 Summary (contd.) 00:49

    4.40 Conclusion 00:13

    Lesson 05 - Spark Streaming 35:09

    5.1 Introduction 00:11

    5.2 Objectives 00:15

    5.3 Introduction to Spark Streaming 00:50

    5.4 Working of Spark Streaming 00:20

    5.5 Features of Spark Streaming

    5.6 Streaming Word Count 01:34

    5.7 Micro Batch 00:19

    5.8 DStreams 00:34

    5.9 DStreams (contd.) 00:39

    5.10 Input DStreams and Receivers 01:19

    5.11 Input DStreams and Receivers (contd.) 00:55

    5.12 Basic Sources 01:14

    5.13 Advanced Sources 00:49

    5.14 Advanced Sources-Twitter

    5.15 Transformations on DStreams 00:15

    5.16 Transformations on Dstreams (contd.) 00:06

    5.17 Output Operations on DStreams 00:29

    5.18 Design Patterns for Using ForeachRDD 01:15

    5.19 DataFrame and SQL Operations 00:26

    5.20 DataFrame and SQL Operations (contd.) 00:20

    5.21 Checkpointing 01:25

    5.22 Enabling Checkpointing 00:39

    5.23 Socket Stream 01:00

    5.24 File Stream 00:12

    5.25 Stateful Operations 00:28

    5.26 Window Operations 01:22

    5.27 Types of Window Operations 00:12

    5.28 Types of Window Operations Types (contd.) 00:06

    5.29 Join Operations-Stream-Dataset Joins 00:21

    5.30 Join Operations-Stream-Stream Joins 00:34

    5.31 Monitoring Spark Streaming Application 01:19

    5.32 Performance Tuning-High Level 00:20

    5.33 Performance Tuning-Detail Level

    5.34 Demo-Capture and Process the Netcat Data 00:07

    5.35 Capture and Process the Netcat Data 05:01

    5.36 Demo-Capture and Process the Flume Data 00:08

    5.37 Capture and Process the Flume Data 05:08

    5.38 Demo-Capture the Twitter Data 00:07

    5.39 Capture the Twitter Data 02:33

    5.40 Quiz

    5.41 Summary 01:01

    5.42 Summary (contd.) 01:04

    5.43 Conclusion 00:12

    Lesson 06 - Spark ML Programming 40:08

    6.1 Introduction 00:12

    6.2 Objectives 00:20

    6.3 Introduction to Machine Learning 01:36

    6.4 Common Terminologies in Machine Learning

    6.5 Applications of Machine Learning 00:22

    6.6 Machine Learning in Spark 00:34

    6.7 Spark ML API

    6.8 DataFrames 00:32

    6.9 Transformers and Estimators 00:59

    6.10 Pipeline 00:48

    6.11 Working of a Pipeline 01:41

    6.12 Working of a Pipeline (contd.) 00:45

    6.13 DAG Pipelines 00:33

    6.14 Runtime Checking 00:21

    6.15 Parameter Passing 01:00

    6.16 General Machine Learning Pipeline-Example 00:05

    6.17 General Machine Learning Pipeline-Example (contd.)

    6.18 Model Selection via Cross-Validation 01:16

    6.19 Supported Types, Algorithms, and Utilities 00:31

    6.20 Data Types 01:26

    6.21 Feature Extraction and Basic Statistics 00:43

    6.22 Clustering 00:38

    6.23 K-Means 00:55

    6.24 K-Means (contd.) 00:05

    6.25 Demo-Perform Clustering Using K-Means 00:07

    6.26 Perform Clustering Using K-Means 04:41

    6.27 Gaussian Mixture 00:57

    6.28 Power Iteration Clustering (PIC) 01:17

    6.29 Latent Dirichlet Allocation (LDA) 00:35

    6.30 Latent Dirichlet Allocation (LDA) (contd.) 01:45

    6.31 Collaborative Filtering 01:13

    6.32 Classification 00:16

    6.33 Classification (contd.) 00:06

    6.34 Regression 00:42

    6.35 Example of Regression 00:56

    6.36 Demo-Perform Classification Using Linear Regression 00:08

    6.37 Perform Classification Using Linear Regression 02:01

    6.38 Demo-Run Linear Regression 00:06

    6.39 Run Linear Regression 02:14

    6.40 Demo-Perform Recommendation Using Collaborative Filtering 00:05

    6.41 Perform Recommendation Using Collaborative Filtering 02:23

    6.42 Demo-Run Recommendation System 00:06

    6.43 Run Recommendation System 02:45

    6.44 Quiz

    6.45 Summary 01:14

    6.46 Summary (contd.) 00:57

    6.47 Conclusion 00:12

    Lesson 07 - Spark GraphX Programming 46:26

    7.001 Introduction 00:14

    7.002 Objectives 00:17

    7.003 Introduction to Graph-Parallel System 01:14

    7.004 Limitations of Graph-Parallel System 00:49

    7.005 Introduction to GraphX 01:21

    7.006 Introduction to GraphX (contd.) 00:06

    7.007 Importing GraphX 00:10

    7.008 The Property Graph 01:25

    7.009 The Property Graph (contd.) 00:07

    7.010 Features of the Property Graph

    7.011 Creating a Graph 00:14

    7.012 Demo-Create a Graph Using GraphX 00:07

    7.013 Create a Graph Using GraphX 10:08

    7.014 Triplet View 00:30

    7.015 Graph Operators 00:51

    7.016 List of Operators 00:23

    7.017 List of Operators (contd.) 00:05

    7.018 Property Operators 00:18

    7.019 Structural Operators 01:02

    7.020 Subgraphs 00:21

    7.021 Join Operators 01:09

    7.022 Demo-Perform Graph Operations Using GraphX 00:07

    7.023 Perform Graph Operations Using GraphX 05:46

    7.024 Demo-Perform Subgraph Operations 00:07

    7.025 Perform Subgraph Operations 01:37

    7.026 Neighborhood Aggregation 00:43

    7.027 mapReduceTriplets 00:42

    7.028 Demo-Perform MapReduce Operations 00:08

    7.029 Perform MapReduce Operations 09:18

    7.030 Counting Degree of Vertex 00:32

    7.031 Collecting Neighbors 00:28

    7.032 Caching and Uncaching 01:10

    7.033 Graph Builders

    7.034 Vertex and Edge RDDs 01:17

    7.035 Graph System Optimizations 01:22

    7.036 Built-in Algorithms

    7.037 Quiz

    7.038 Summary 01:12

    7.039 Summary (contd.) 00:55

    7.040 Conclusion 00:11

    That was just a sneak-peak into the lesson.
    Enroll for this course and get full access.
    Enroll now


Course Fee:
USD 799

Course Type:

Self-Study

Course Status:

Active

Workload:

1 - 4 hours / week

Attended this course?

Back to Top

 
Awards & Accolades for MyTechLogy
Winner of
REDHERRING
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url

Back to Top