Data Science, R, Mahout, SAS Training – Combo Course
Intellipaat
Course Summary
Our Data Science certification master program lets you become a skilled Data Scientist. We provide the best online training classes to help you learn the various aspects of Data Science like data acquisition, analysis, Apache Mahout, statistical methods, SAS programming, clustering, vectors. Work on critical real world projects.

+
Course Description
About Course
What you will learn in this Data Science Certification Training Course?
 Introduction to the role of Data Scientist
 Learn programming in R language
 Mahout Machine Learning Algorithms
 Learn SAS Programming
 Learn about Project Life Cycle in Data Science
 Understand Vector Creation and Variable Value Assignment
 Learn about Database Connectivity
 Get to know Data collection, conversion & interpretation
 Understand Linear and Logistic Regression
 Learn Clustering and Vectorizing data
 The concepts of Statistics
 Probability rules and Bayes Theorem
 Get to know Sampling methods and Plotting techniques
 Learn the concepts of Tables and Data Analysis
 Integrate R with Hadoop
Who should take this Data Science Certification Training Course?
 Data Scientists, Analysts, Machine Learning professionals, Statistician
 Programmers, Business Intelligence professionals, Information Architects, Project Managers
 Those looking to work in Data Science
What are the prerequisites for taking this Training Course?
There are no prerequisites for taking this Training Course.Why should you take this Training Course?
This is a complete Training Course in the field of Data Science and Data Analysis that can make you industryready. You will gain deep expertise in multiple technologies and platforms including project management. This Course will equip you with the muchneeded programming expertise in R, learn about Apache Mahout, and get to know the techniques of Statistics and Probability. All in all you will have the right skills to work in the Data Science field in the best companies around the world at top salaries.

+
Course Syllabus
R Programming Course Content
Introduction to RR language for statistical programming, the various features of R, introduction to R Studio, the statistical packages, familiarity with different data types and functions, learning to deploy them in various scenarios, use SQL to apply ‘join’ function, components of R Studio like code editor, visualization and debugging tools, learn about Rbind.
RPackagesR Functions, code compilation and data in welldefined format called RPackages, learn about RPackage structure, Package metadata and testing, CRAN (Comprehensive R Archive Network), Vector creation and variables values assignment.
Sorting DataframeR functionality, Rep Function, generating Repeats, Sorting and generating Factor Levels, Transpose and Stack Function.
Matrices and VectorsIntroduction to matrix and vector in R, understanding the various functions like Merge, Strsplit, Matrix manipulation, rowSums, rowMeans, colMeans, colSums, sequencing, repetition, indexing and other functions.
Reading data from external filesUnderstanding subscripts in plots in R, how to obtain parts of vectors, using subscripts with arrays, as logical variables, with lists, understanding how to read data from external files.
Generating plotsGenerate plot in R, Graphs, Bar Plots, Line Plots, Histogram, components of Pie Chart.
Analysis of Variance (ANOVA)Understanding Analysis of Variance (ANOVA) statistical technique, working with Pie Charts, Histograms, deploying ANOVA with R, one way ANOVA, two way ANOVA.
Kmeans ClusteringKMeans Clustering for Cluster & Affinity Analysis, Cluster Algorithm, cohesive subset of items, solving clustering issues, working with large datasets, association rule mining affinity analysis for data mining and analysis and learning cooccurrence relationships.
Association Rule MiningIntroduction to Association Rule Mining, the various concepts of Association Rule Mining, various methods to predict relations between variables in large datasets, the algorithm and rules of Association Rule Mining, understanding single cardinality.
Regression in RUnderstanding what is Simple Linear Regression, the various equations of Line, Slope, YIntercept Regression Line, deploying analysis using Regression, the least square criterion, interpreting the results, standard error to estimate and measure of variation.
Analyzing Relationship with RegressionScatter Plots, Two variable Relationship, Simple Linear Regression analysis, Line of best fit
Advance RegressionDeep understanding of the measure of variation, the concept of coefficient of determination, FTest, the test statistic with an Fdistribution, advanced regression in R, prediction linear regression.
Logistic RegressionLogistic Regression Mean, Logistic Regression in R.
Advance Logistic RegressionAdvanced logistic regression, understanding how to do prediction using logistic regression, ensuring the model is accurate, understanding sensitivity and specificity, confusion matrix, what is ROC, a graphical plot illustrating binary classifier system, ROC curve in R for determining sensitivity/specificity tradeoffs for a binary classifier.
Receiver Operating Characteristic (ROC)Detailed understanding of ROC, area under ROC Curve, converting the variable, data set partitioning, understanding how to check for multicollinearlity, how two or more variables are highly correlated, building of model, advanced data set partitioning, interpreting of the output, predicting the output, detailed confusion matrix, deploying the HosmerLemeshow test for checking whether the observed event rates match the expected event rates.
Kolmogorov Smirnov ChartData analysis with R, understanding the WALD test, MC Fadden’s pseudo Rsquared, the significance of the area under ROC Curve, Kolmogorov Smirnov Chart which is nonparametric test of one dimensional probability distribution.
Database connectivity with RConnecting to various databases from the R environment, deploying the ODBC tables for reading the data, visualization of the performance of the algorithm using Confusion Matrix.
Integrating R with HadoopCreating an integrated environment for deploying R on Hadoop platform, working with R Hadoop, RMR package and R Hadoop Integrated Programming Environment, R programming for MapReduce jobs and Hadoop execution.
R Case StudiesLogistic Regression Case Study
In this case study you will get a detailed understanding of the advertisement spends of a company that will help to drive more sales. You will deploy logistic regression to forecast the future trends, detect patterns, uncover insights and more all through the power of R programming. Due to this the future advertisement spends can be decided and optimized for higher revenues.
Multiple Regression Case Study
You will understand how to compare the miles per gallon (MPG) of a car based on the various parameters. You will deploy multiple regression and note down the MPG for car make, model, speed, load conditions, etc. It includes the model building, model diagnostic, checking the ROC curve, among other things.
Receiver Operating Characteristic (ROC) case study
You will work with various data sets in R, deploy data exploration methodologies, build scalable models, predict the outcome with highest precision, diagnose the model that you have created with various real world data, check the ROC curve and more.
Mahout Course Content
Mahout OverviewClassification and Recommendation, Clustering in Mahout, Pattern Mining, Understanding machine Learning, Using Model diagram to decide the approach, Data flow, Supervised and Unsupervised learning
Mahout RecommendationsConcept of Recommendation, Recommendations by Ecommerce site, Comparison between User Recommendations and Item recommendation, Define recommenders and Classifiers, Process of Collaborative Filtering, Explaining Pearson coefficient algorithm, Euclidean distance measure, Implementing a recommender using map reduce
Clustering Session 1Defining Clustering, Usertouser similarity, Clustering Illustration, Euclidean distance measure, Distance measure vector, Understanding the process of Clustering, Vectorizing documentsUnstructured data
Clustering Session 2Document clustering, Sequencetosparse Utility, KMean Clustering
Classification Session 1Terminology, Predictor and Target variable, Classifiable DataKey Challenges in Classification algorithm, Vectorizing Continuous data, Classification Examples, Logic Regression and its examples
Clustering and Classification Session 2Clustering, Clustering Process, Transaction Clustering, Different techniques of Vectorization, Distance measure, Clustering algorithmKMEAN, Clustering Application1, Clustering Application2, Sentiment Analyzer
Pattern MiningPearson Coefficient, Collaborative Filtering Process, Collaborative Filtering, Similarity Algorithms, Pearson Correlation, Euclidean Distance Measure Frequent Pattern & Association rules, Frequent Pattern Growth
Data Science Course Content
Introduction to Data Science and Statistical AnalyticsIntroduction to Data Science, Use cases, Need of Business Analytics, Data Science Life Cycle, Different tools available for Data ScienceIntroduction to RInstalling R and RStudio, R packages, R Operators, if statements and loops (for, while, repeat, break, next), switch caseData Exploration, Data Wrangling and R Data StructureImporting and Exporting data from external source, Data exploratory analysis, R Data Structure (Vector, Scalar, Matrices, Array, Data frame, List), Functions, Apply FunctionsData VisualizationBar Graph (Simple, Grouped, Stacked), Histogram, Pi Chart, Line Chart, Box (Whisker) Plot, Scatter Plot, CorrelogramIntroduction to StatisticsTerminologies of Statistics ,Measures of Centers, Measures of Spread, Probability, Normal Distribution, Binary Distribution, Hypothesis Testing, Chi Square Test, ANOVAPredictive Modeling – 1 ( Linear Regression)Supervised Learning – Linear Regression ,Bivariate Regression, Multiple Regression Analysis, Correlation( Positive, negative and neutral), Industrial Case Study, Machine Learning UseCases, Machine Learning Process Flow, Machine Learning CategoriesPredictive Modeling – 2 ( Logistic Regression)Logistic RegressionDecision TreesWhat is Classification and its use cases?, What is Decision Tree?, Algorithm for Decision Tree Induction, Creating a Perfect Decision Tree, Confusion MatrixRandom ForestRandom Forest, What is Naive Bayes?Unsupervised learningWhat is Clustering & its Use Cases?, What is Kmeans Clustering?, What is Canopy Clustering?, What is Hierarchical Clustering?Association Analysis and Recommendation engineMarket Basket Analysis (MBA), Association Rules, Apriori Algorithm for MBA, Introduction of Recommendation Engine, Types of Recommendation – UserBased and ItemBased, Recommendation UsecaseSentiment AnalysisIntroduction to Text Mining, Introduction to Sentiment, Setting up API bridge, between R and Tweeter Account, Extracting Tweet from Tweeter Acc, Scoring the tweetTime SeriesWhat is Time Series data?, Time Series variables, Different components of Time Series data, Visualize the data to identify Time Series Components, Implement ARIMA model for forecasting, Exponential smoothing models, Identifying different time series scenario based on which different Exponential Smoothing model can be applied, Implement respective ETS model for forecastingSAS Course Content
Introduction to SASIntroduction to Base SAS, Installation of SAS tool, Getting started with SAS, various SAS Windows – Log, Explorer, Output, Search, Editor, etc. working with data sets, overview of SAS Functions, Library Types and programming files
SAS Enterprise GuideImport/Export Raw Data files, reading and sub setting the data set, various statements like WHERE, SET, Merge
Handson Exercise – Import Excel file in workspace, Read data, Export the workspace to save data
SAS Operators & FunctionsVarious SAS Operators – Arithmetic, Logical, Comparison, various SAS Functions – NUMERIC, CHARACTER, IS NULL, CONTAINS, LIKE, Input/Put, Date/Time, Conditional Statements (Do While, Do Until, If, Else)
Handson Exercise – Apply logical, arithmetic operators and SAS functions to perform operations
Compilation & ExecutionUnderstanding about Input Buffer, PDV (Backend), learning what is Missover
Using VariablesDefining and Using KEEP and DROP statements, apply these statements, Format and Labels in SAS.
Handson Exercise – Use KEEP and DROP statements
Creation and Compilation of SAS Data setsUnderstanding Delimiter, dataline rules, DLM, Delimiter DSD, raw data files and execution, list input for standard data.
Handson Exercise – Use delimiter rules on raw data files
SAS ProceduresThe various SAS standard Procedures builtin for popular programs – PROC SORT, PROC FREQ, PROC SUMMARY, PROC RANK, PROC EXPORT, PROC DATASET, PROC TRANSPOSE, , PROC CORR etc.
Handson Exercise – Use SORT, FREQ, SUMMARY, EXPORT and other procedures
Input statement and formatted inputReading standard and nonstandard numeric inputs with Formatted inputs, Column Pointer Controls, Controlling while a record loads, Line pointer control / Absolute line pointer control, Single Trailing , Multiple IN and OUT statements, DATA LINES statement and rules, List Input Method, comparing Single Trailing and Double Trailing.
Handson Exercise – Read standard and nonstandard numeric inputs with Formatted inputs, Control while a record loads, Control a Line pointer, Write Multiple IN and OUT statements
SAS FORMATSAS FORMAT statements – standard and userwritten, associating a format with a variable, working with SAS FORMAT, deploying it on PROC Data sets, comparing ATTRIB and FORMAT statements.
Handson Exercise – Format a variable, deploy format rule on PROC DATA set, Use ATTRIB statement
SAS GraphsUnderstanding PROC GCHART, various Graphs, Bar Charts – Pie, Bar, 3D, plotting variables with PROC GPLOT.
Handson Exercise – Plot graphs using PROC GPLOT Display charts using PROC GCHART
Interactive Data ProcessingSAS advanced data discovery and visualization, pointandclick analytics capabilities, powerful reporting tools.
Data Transformation FunctionCharacter Functions, Numeric Functions, Converting Variable Type.
Handson Exercise – Use Functions in data transformation
Output Delivery System (ODS)Introduction to ODS, Data Optimization, How to generate files (rtf, pdf, html, doc) using SAS
Handson Exercise – Optimize data, generate rtf, pdf, html and doc files
SAS MACROSMacro Syntax, Macro Variables, Positional Parameters in a Macro, Macro Step
Handson Exercise – Write a macro, Use positional parameters
PROC SQLSQL Statements in SAS, SELECT, CASE, JOIN, UNION, Sorting Data
Handson Exercise – Create sql query to select and add a condition
Use a CASE in select queryAdvanced Base SASBase SAS webbased interface and readytouse programs, advanced data manipulation, storage and retrieval, descriptive statistics.
Handson Exercise – Use web UI to do statistical operations
Summarization ReportsReport Enhancement, Global Statements, Userdefined Formats, PROC SORT, ODS Destinations, ODS Listing, PROC FREQ, PROC Means, PROC UNIVARIATE, PROC REPORT, PROC PRINT
Handson Exercise – Use PROC SORT to sort the results, List ODS, Find mean using PROC Means, print using PROC PRINT
R Programming ProjectsProject 1
Domain – Restaurant Revenue Prediction
Data set – Sales
Project Description – This project involves predicting the sales of a restaurant on the basis of certain objective measurements. This project will give real time industry experience on handling multiple use cases and derive the solution. This project gives insights about feature engineering and selection.
Project 2
Domain – Data AnalyticsObjective – To predict about the class of a flower using its petal’s dimensions
Project 3
Domain – FinanceObjective – The project aims to find the most impacting factors in preferences of prepaid model, also identifies which are all the variables highly correlated with impacting factors
Project 4
Domain – Stock MarketObjective – This project focuses on Machine Learning by creating predictive data model to predict future stock prices
Data Science ProjectProject 1 – Understanding Cold Start Problem in Data Science
Topics: This project involves understanding of the cold start problem associated with the recommender systems. You will gain handson experience in information filtering, working on systems with zero historical data to refer to, as in the case of launching a new product. You will gain proficiency in working with personalized applications like movies, books, songs, news and such other recommendations. This project includes the following:
 Algorithms for Recommender
 Ways of Recommendation
 Types of Recommendation Collaborative Filtering Based Recommendation, ContentBased Recommendation
 Complete mastery in working with the Cold Start Problem.
Project 2 – Recommendation for Movie, Summary
Topics: This is real world project that gives you handson experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provider datadriven recommendations. This project involves understanding recommender systems, information filtering, predicting ‘rating’, learning about user ‘preference’ and so on. You will exclusively work on data related to user details, movie details and others. The main components of the project include the following:
 Recommendation for movie
 Two Types of Predictions – Rating Prediction, Item Prediction
 Important Approaches: Memory Based and ModelBased
 Knowing User Based Methods in KNearest Neighbor
 Understanding Item Based Method
 Matrix Factorization
 Decomposition of Singular Value
 Data Science Project discussion
 Collaboration Filtering
 Business Variables Overview
Statistics and Probability ProjectProject – Data Analysis Project
Data – Sales
Problem Statement – It includes the following actions:
Understand the business solutions, Discussion with the warehouse team, Data Collection & Storage, Data Cleaning, Build a Hypothesis Tree around the business problem, Produce the final result.
SAS ProjectsProject 1 – Build analytical solution for patients taking medicines
Domain: Health Care
Objective – This project aims to find out descriptive statistics & subset for specific clinical data problems. It will give them brief insight about BASE SAS procedures and data steps.
Project 2 – Build revenue projections reports
Domain: Sales
Objective – This project will give you handson experience in working with the SAS data analytics and business intelligence tool. You will be working on the data entered in a business enterprise setup, aggregate, retrieve and manage that data. You will learn to create insightful reports and graphs and come up with statistical and mathematical analysis to scientifically predict the revenue projection for a particular future time frame. Upon completion of the project you will be wellversed in the practical aspects of data analytics, predictive modeling, and data mining.
Project 3
Domain: Finance Market
Objective – The project aims to find the most impacting factors in preferences of prepaid model, also identifies which are all the variables highly correlated with impacting factors
Project 4
Domain: Analytics
Objective – kMeans Cluster analysis on Iris dataset to predict about the class of a flower using its petal’s dimensions