Supercharge R with SparkR - Apply your R chops to Big Data!
Extend R with Spark and SparkR - Create clusters on AWS, perform distributed modeling, and access HDFS and S3
In this class you will learn:
- how to use R in a distributed environment
- create Spark clusters on Amazon's AWS
- perform distributed modeling using GLM
- measure distributed regression and classification predictions
- access data from csv's, json, hdfs, and S3
All our examples will be performed on real clusters - no training wheels, single local clusters or third-party tools.
Note 1: you will need to know how to SSH to your Amazon AWS instance (I will show how I do it using the Mac but Windows or Linux isn't covered)
Note 2: There is a minimal cost involved when using Amazon's AWS instances. This biggest machine we will use is around 0.05 US cents/hour/machine.
This course is listed under Open Source , Cloud Computing , Development & Implementations , Data & Information Management , Operating Systems and Server & Storage Management Community