In this class you will learn:
- how to use R in a distributed environment
- create Spark clusters on Amazon's AWS
- perform distributed modeling using GLM
- measure distributed regression and classification predictions
- access data from csv's, json, hdfs, and S3
All our examples will be performed on real clusters - no training wheels, single local clusters or third-party tools.
Note 1: you will need to know how to SSH to your Amazon AWS instance (I will show how I do it using the Mac but Windows or Linux isn't covered)
Note 2: There is a minimal cost involved when using Amazon's AWS instances. This biggest machine we will use is around 0.05 US cents/hour/machine.