Real World Spark 2 - Spark Core Overview

Udemy

Course Summary

Why you should take a look at Spark 2. The easiest, open source, modern cluster computation engine to write code against

+
Course Description

Why Apache Spark ...

Apache SparkÂ run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.Â Apache SparkÂ has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.Â Apache SparkÂ offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.Â Apache SparkÂ can combine SQL, streaming, and complex analytics.

Apache SparkÂ powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Spark Overview

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Jupyter Notebook

Jupyter NotebookÂ is a system similar to Mathematica that allows you to createÂ "executable documents".Â Notebooks integrate formatted text (Markdown), executable code (Scala),

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

The Jupyter Notebook is based on a set of open standards for interactive computing. Think HTML and CSS for interactive computing on the web. These open standards can be leveraged by third party developers to build customized applications with embedded interactive computing.

Spark shell

Sparkâ€™s shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. It is available in either Scala (which runs on the Java VM and is thus a good way to use existing Java libraries) or Python.

ScalaIDE

Scala IDE provides advanced editing and debugging support for the development of pure Scala and mixed Scala-Java applications.

Spark Monitoring and Instrumentation

EveryÂ SparkContextÂ launches a web UI, by default on port 4040, that displays useful information about the application. This includes:

A list of scheduler stages and tasks A summary of RDD sizes and memory usage Environmental information. Information about the running executors

Course Description

Why Apache Spark ...

Apache SparkÂ run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk.Â Apache SparkÂ has an advanced DAG execution engine that supports cyclic data flow and in-memory computing.Â Apache SparkÂ offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, Python and R shells.Â Apache SparkÂ can combine SQL, streaming, and complex analytics.

Apache SparkÂ powers a stack of libraries including SQL and DataFrames, MLlib for machine learning, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application.

Spark Overview

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Jupyter Notebook

Jupyter NotebookÂ is a system similar to Mathematica that allows you to createÂ "executable documents".Â Notebooks integrate formatted text (Markdown), executable code (Scala),

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.

The Jupyter Notebook is based on a set of open standards for interactive computing. Think HTML and CSS for interactive computing on the web. These open standards can be leveraged by third party developers to build customized applications with embedded interactive computing.

Spark shell

Sparkâ€™s shell provides a simple way to learn the API, as well as a powerful tool to analyze data interactively. It is available in either Scala (which runs on the Java VM and is thus a good way to use existing Java libraries) or Python.

ScalaIDE

Scala IDE provides advanced editing and debugging support for the development of pure Scala and mixed Scala-Java applications.

Spark Monitoring and Instrumentation

EveryÂ SparkContextÂ launches a web UI, by default on port 4040, that displays useful information about the application. This includes:

A list of scheduler stages and tasks A summary of RDD sizes and memory usage Environmental information. Information about the running executors

Course Fee:

USD 100

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

This course is listed under Open Source Community

Attended this course? Write a Review

Course Fee:

USD 100

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

IT Career Development Platform

Real World Spark 2 - Spark Core Overview

Udemy

Course Summary

Course Description

Course Description

Course Type:

Course Status:

Workload:

Open source

Attended this course? Write a Review

Course Type:

Course Status:

Workload: