R Case Studies
Logistic Regression Case Study
In this case study you will get a detailed understanding of the advertisement spends of a company that will help to drive more sales. You will deploy logistic regression to forecast the future trends, detect patterns, uncover insights and more all through the power of R programming. Due to this the future advertisement spends can be decided and optimized for higher revenues.
Multiple Regression Case Study
You will understand how to compare the miles per gallon (MPG) of a car based on the various parameters. You will deploy multiple regression and note down the MPG for car make, model, speed, load conditions, etc. It includes the model building, model diagnostic, checking the ROC curve, among other things.
Receiver Operating Characteristic (ROC) case study
You will work with various data sets in R, deploy data exploration methodologies, build scalable models, predict the outcome with highest precision, diagnose the model that you have created with various real world data, check the ROC curve and more.
Mahout Course Content
Classification and Recommendation, Clustering in Mahout, Pattern Mining, Understanding machine Learning, Using Model diagram to decide the approach, Data flow, Supervised and Unsupervised learning
Concept of Recommendation, Recommendations by E-commerce site, Comparison between User Recommendations and Item recommendation, Define recommenders and Classifiers, Process of Collaborative Filtering, Explaining Pearson coefficient algorithm, Euclidean distance measure, Implementing a recommender using map reduce
Clustering Session 1
Defining Clustering, User-to-user similarity, Clustering Illustration, Euclidean distance measure, Distance measure vector, Understanding the process of Clustering, Vectorizing documents-Unstructured data
Clustering Session 2
Document clustering, Sequence-to-sparse Utility, K-Mean Clustering
Classification Session 1
Terminology, Predictor and Target variable, Classifiable DataKey Challenges in Classification algorithm, Vectorizing Continuous data, Classification Examples, Logic Regression and its examples
Clustering and Classification Session 2
Clustering, Clustering Process, Transaction Clustering, Different techniques of Vectorization, Distance measure, Clustering algorithm-K-MEAN, Clustering Application-1, Clustering Application-2, Sentiment Analyzer
Pearson Coefficient, Collaborative Filtering Process, Collaborative Filtering, Similarity Algorithms, Pearson Correlation, Euclidean Distance Measure -Frequent Pattern & Association rules, Frequent Pattern Growth
Data Science Course Content
Introduction to Data Science and Statistical Analytics
Introduction to Data Science, Use cases, Need of Business Analytics, Data Science Life Cycle, Different tools available for Data Science
Introduction to R
Installing R and R-Studio, R packages, R Operators, if statements and loops (for, while, repeat, break, next), switch case
Data Exploration, Data Wrangling and R Data Structure
Importing and Exporting data from external source, Data exploratory analysis, R Data Structure (Vector, Scalar, Matrices, Array, Data frame, List), Functions, Apply Functions
Bar Graph (Simple, Grouped, Stacked), Histogram, Pi Chart, Line Chart, Box (Whisker) Plot, Scatter Plot, Correlogram
Introduction to Statistics
Terminologies of Statistics ,Measures of Centers, Measures of Spread, Probability, Normal Distribution, Binary Distribution, Hypothesis Testing, Chi Square Test, ANOVA
Predictive Modeling – 1 ( Linear Regression)
Supervised Learning – Linear Regression ,Bivariate Regression, Multiple Regression Analysis, Correlation( Positive, negative and neutral), Industrial Case Study, Machine Learning Use-Cases, Machine Learning Process Flow, Machine Learning Categories
Predictive Modeling – 2 ( Logistic Regression)
What is Classification and its use cases?, What is Decision Tree?, Algorithm for Decision Tree Induction, Creating a Perfect Decision Tree, Confusion Matrix
Random Forest, What is Naive Bayes?
What is Clustering & its Use Cases?, What is K-means Clustering?, What is Canopy Clustering?, What is Hierarchical Clustering?
Association Analysis and Recommendation engine
Market Basket Analysis (MBA), Association Rules, Apriori Algorithm for MBA, Introduction of Recommendation Engine, Types of Recommendation – User-Based and Item-Based, Recommendation Use-case
Introduction to Text Mining, Introduction to Sentiment, Setting up API bridge, between R and Tweeter Account, Extracting Tweet from Tweeter Acc, Scoring the tweet
What is Time Series data?, Time Series variables, Different components of Time Series data, Visualize the data to identify Time Series Components, Implement ARIMA model for forecasting, Exponential smoothing models, Identifying different time series scenario based on which different Exponential Smoothing model can be applied, Implement respective ETS model for forecasting
SAS Course Content
Introduction to SAS
Introduction to Base SAS, Installation of SAS tool, Getting started with SAS, various SAS Windows – Log, Explorer, Output, Search, Editor, etc. working with data sets, overview of SAS Functions, Library Types and programming files
SAS Enterprise Guide
Import/Export Raw Data files, reading and sub setting the data set, various statements like WHERE, SET, Merge
Hands-on Exercise – Import Excel file in workspace, Read data, Export the workspace to save data
SAS Operators & Functions
Various SAS Operators – Arithmetic, Logical, Comparison, various SAS Functions – NUMERIC, CHARACTER, IS NULL, CONTAINS, LIKE, Input/Put, Date/Time, Conditional Statements (Do While, Do Until, If, Else)
Hands-on Exercise – Apply logical, arithmetic operators and SAS functions to perform operations
Compilation & Execution
Understanding about Input Buffer, PDV (Backend), learning what is Missover
Defining and Using KEEP and DROP statements, apply these statements, Format and Labels in SAS.
Hands-on Exercise – Use KEEP and DROP statements
Creation and Compilation of SAS Data sets
Understanding Delimiter, dataline rules, DLM, Delimiter DSD, raw data files and execution, list input for standard data.
Hands-on Exercise – Use delimiter rules on raw data files
The various SAS standard Procedures built-in for popular programs – PROC SORT, PROC FREQ, PROC SUMMARY, PROC RANK, PROC EXPORT, PROC DATASET, PROC TRANSPOSE, , PROC CORR etc.
Hands-on Exercise – Use SORT, FREQ, SUMMARY, EXPORT and other procedures
Input statement and formatted input
Reading standard and non-standard numeric inputs with Formatted inputs, Column Pointer Controls, Controlling while a record loads, Line pointer control / Absolute line pointer control, Single Trailing , Multiple IN and OUT statements, DATA LINES statement and rules, List Input Method, comparing Single Trailing and Double Trailing.
Hands-on Exercise – Read standard and non-standard numeric inputs with Formatted inputs, Control while a record loads, Control a Line pointer, Write Multiple IN and OUT statements
SAS FORMAT statements – standard and user-written, associating a format with a variable, working with SAS FORMAT, deploying it on PROC Data sets, comparing ATTRIB and FORMAT statements.
Hands-on Exercise – Format a variable, deploy format rule on PROC DATA set, Use ATTRIB statement
Understanding PROC GCHART, various Graphs, Bar Charts – Pie, Bar, 3D, plotting variables with PROC GPLOT.
Hands-on Exercise – Plot graphs using PROC GPLOT Display charts using PROC GCHART
Interactive Data Processing
SAS advanced data discovery and visualization, point-and-click analytics capabilities, powerful reporting tools.
Data Transformation Function
Character Functions, Numeric Functions, Converting Variable Type.
Hands-on Exercise – Use Functions in data transformation
Output Delivery System (ODS)
Introduction to ODS, Data Optimization, How to generate files (rtf, pdf, html, doc) using SAS
Hands-on Exercise – Optimize data, generate rtf, pdf, html and doc files
Macro Syntax, Macro Variables, Positional Parameters in a Macro, Macro Step
Hands-on Exercise – Write a macro, Use positional parameters
SQL Statements in SAS, SELECT, CASE, JOIN, UNION, Sorting Data
Hands-on Exercise – Create sql query to select and add a condition
Use a CASE in select query
Advanced Base SAS
Base SAS web-based interface and ready-to-use programs, advanced data manipulation, storage and retrieval, descriptive statistics.
Hands-on Exercise – Use web UI to do statistical operations
Report Enhancement, Global Statements, User-defined Formats, PROC SORT, ODS Destinations, ODS Listing, PROC FREQ, PROC Means, PROC UNIVARIATE, PROC REPORT, PROC PRINT
Hands-on Exercise – Use PROC SORT to sort the results, List ODS, Find mean using PROC Means, print using PROC PRINT
R Programming Projects
Domain – Restaurant Revenue Prediction
Data set – Sales
Project Description – This project involves predicting the sales of a restaurant on the basis of certain objective measurements. This project will give real time industry experience on handling multiple use cases and derive the solution. This project gives insights about feature engineering and selection.
Domain – Data AnalyticsObjective – To predict about the class of a flower using its petal’s dimensions
Domain – FinanceObjective – The project aims to find the most impacting factors in preferences of pre-paid model, also identifies which are all the variables highly correlated with impacting factors
Domain – Stock MarketObjective – This project focuses on Machine Learning by creating predictive data model to predict future stock prices
Data Science Project
Project 1 – Understanding Cold Start Problem in Data Science
Topics: This project involves understanding of the cold start problem associated with the recommender systems. You will gain hands-on experience in information filtering, working on systems with zero historical data to refer to, as in the case of launching a new product. You will gain proficiency in working with personalized applications like movies, books, songs, news and such other recommendations. This project includes the following:
- Algorithms for Recommender
- Ways of Recommendation
- Types of Recommendation -Collaborative Filtering Based Recommendation, Content-Based Recommendation
- Complete mastery in working with the Cold Start Problem.
Project 2 – Recommendation for Movie, Summary
Topics: This is real world project that gives you hands-on experience in working with a movie recommender system. Depending on what movies are liked by a particular user, you will be in a position to provider data-driven recommendations. This project involves understanding recommender systems, information filtering, predicting ‘rating’, learning about user ‘preference’ and so on. You will exclusively work on data related to user details, movie details and others. The main components of the project include the following:
- Recommendation for movie
- Two Types of Predictions – Rating Prediction, Item Prediction
- Important Approaches: Memory Based and Model-Based
- Knowing User Based Methods in K-Nearest Neighbor
- Understanding Item Based Method
- Matrix Factorization
- Decomposition of Singular Value
- Data Science Project discussion
- Collaboration Filtering
- Business Variables Overview
Statistics and Probability Project
Project – Data Analysis Project
Data – Sales
Problem Statement – It includes the following actions:
Understand the business solutions, Discussion with the warehouse team, Data Collection & Storage, Data Cleaning, Build a Hypothesis Tree around the business problem, Produce the final result.
Project 1 – Build analytical solution for patients taking medicines
Domain: Health Care
Objective – This project aims to find out descriptive statistics & subset for specific clinical data problems. It will give them brief insight about BASE SAS procedures and data steps.
Project 2 – Build revenue projections reports
Objective – This project will give you hands-on experience in working with the SAS data analytics and business intelligence tool. You will be working on the data entered in a business enterprise setup, aggregate, retrieve and manage that data. You will learn to create insightful reports and graphs and come up with statistical and mathematical analysis to scientifically predict the revenue projection for a particular future time frame. Upon completion of the project you will be well-versed in the practical aspects of data analytics, predictive modeling, and data mining.
Domain: Finance Market
Objective – The project aims to find the most impacting factors in preferences of pre-paid model, also identifies which are all the variables highly correlated with impacting factors
Objective – k-Means Cluster analysis on Iris dataset to predict about the class of a flower using its petal’s dimensions