MyPage is a personalized page based on your interests.The page is customized to help you to find content that matters you the most.

I'm not curious

How Spark Application Works In Real-Time?

Published on 03 May 18
We could call Apache Spark as a large-scale processing engine which was recently invested in active development. It was introduced as a more viable option against MapReduce enhancing the developer’s quotient and performance of an application through Spark analytics. If Spark analytics are combined with its fellow related projects like Spark Streaming, Spark MLlib, Spark SQL, and others, the result emerged is a very impressive one. Additionally, the streaming ventures include the ability to constantly compute data transformation.

This article shed some light on the working of a Spark application in real life. Our focus would be majorly on a simple Spark word count application in three different Spark supportive languages, namely Scala, Java, and Python. However, the Scala and java codes where first written for Cloudera tutorials. Our story would be divided into three phrases of Spark application development. Those would start from Writing the Application, then Compile and Packaging the Scala and Java Applications, all the ways to Running the Application.

Writing the Application-

While writing the application our prime focus would be to learn the distribution of letters in the most updated words in a corpus. The following are steps for how to write an application-

• The first thing would be to create a Sparkconference and Sparkcontext. Our Spark application would correspond towards cases of Saprk context. While you are running a shell the Spark context is formulated for you.

• Get a threshold for word frequency meanwhile read an input set of text documents.

• Monitor the number of times each work appears.

• Filter every word which appears lesser than the threshold.

• For the left-out words, tally the number of times each letter pops up.

When we talk about a MapReduce, it would require you 2 sperate MapReduce applications while preparing the midway data for HDSF. However, in Spark the same application requires 90 percent lesser code lines then the one catered the other way.

Compiling and Packaging the Scala Java Application-

Now to keep it simple use Maven to compile and package both Java and Scala programs. Maven is used to building Spark application for a better hand at practicing. On the other side, to compile and package Scala version, you need to include the Scala tools pug in.

Application Run-

For running the application, the input to your application would be a larger text file, with each line comprised of all word in a document without any punctuations. Now, put the input in the HDFS directory.

Now you are free to run one application by the means of Spark-submit-

For Scala- your run would require a local process with threshold 2. Java works on parallel elements and Python requires YARN also with threshold 2.

Related Posts:
Post a Comment

Please notify me the replies via email.

  • We hope the conversations that take place on will be constructive and thought-provoking.
  • To ensure the quality of the discussion, our moderators may review/edit the comments for clarity and relevance.
  • Comments that are promotional, mean-spirited, or off-topic may be deleted per the moderators' judgment.
You may also be interested in
Awards & Accolades for MyTechLogy
Winner of
Top 100 Asia
Finalist at SiTF Awards 2014 under the category Best Social & Community Product
Finalist at HR Vendor of the Year 2015 Awards under the category Best Learning Management System
Finalist at HR Vendor of the Year 2015 Awards under the category Best Talent Management Software
Hidden Image Url