Introduction to Data Science

Coursera

Course Summary

Join the data revolution. Companies are searching for data scientists. This specialized field demands multiple skills not easy to obtain through conventional curricula. Introduce yourself to the basics of data science and leave armed with practical experience extracting value from big data. #uwdatasci

+
Course Description

Commerce and research are being transformed by data-driven discovery and prediction. Skills required for data analytics at massive levels – scalable data management on and off the cloud, parallel algorithms, statistical modeling, and proficiency with a complex ecosystem of tools and platforms – span a variety of disciplines and are not easy to obtain through conventional curricula. Tour the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modeling (e.g., linear and non-linear regression).

Course Description

Commerce and research are being transformed by data-driven discovery and prediction. Skills required for data analytics at massive levels – scalable data management on and off the cloud, parallel algorithms, statistical modeling, and proficiency with a complex ecosystem of tools and platforms – span a variety of disciplines and are not easy to obtain through conventional curricula. Tour the basic techniques of data science, including both SQL and NoSQL solutions for massive data management (e.g., MapReduce and contemporaries), algorithms for data mining (e.g., clustering and association rule mining), and basic statistical modeling (e.g., linear and non-linear regression).

+
Course Syllabus
Part 0: Introduction
- Examples, data science articulated, history and context, technology landscape
Part 1: Data Manipulation at Scale
- Databases and the relational algebra
- Parallel databases, parallel query processing, in-database analytics
- MapReduce, Hadoop, relationship to databases, algorithms, extensions, languages
- Key-value stores and NoSQL; tradeoffs of SQL and NoSQL
Part 2: Analytics
- Topics in statistical modeling: basic concepts, experiment design, pitfalls
- Topics in machine learning: supervised learning (rules, trees, forests, nearest neighbor, regression), optimization (gradient descent and variants), unsupervised learning
Part 3: Communicating Results
- Visualization, data products, visual data analytics
- Provenance, privacy, ethics, governance
Part 4: Special Topics
- Graph Analytics: structure, traversals, analytics, PageRank, community detection, recursive queries, semantic web
- Guest Lectures
+
Recommended Background

We expect you to have intermediate programming experience and familiarity with databases, roughly equivalent to two college courses. We will have four programming assignments: two in Python, one in SQL, and one in R. The target audience is undergraduate students across disciplines who wish to build proficiency working with large datasets and a range of tools to perform predictive analytics.

After taking this course, you may be interested in participating in the three-course Certificate in Data Science offered through the University of Washington Professional and Continuing Education program. This online course will provide an overview and introduction to the more extensive material covered in that program, which offers classroom-based instruction by data scientists from Microsoft and other Seattle players, networking opportunities with peers, case studies from the "front lines," and deep dives into selected topics.
+
Course Format

The class will consist of lecture videos about 8 to 10 minutes in length. These will contain 1-2 integrated quizzes per video. Some of these videos will be given by guest lecturers from the data science community.

There will be no formal exams or standalone quizzes.

There will be eight total assignments of which two are optional.

We will provide a virtual machine equipped with all necessary software, but you are permitted (and encouraged) to install software in your own environment as well.

There will be four structured programming assignments: two in Python, one in SQL, and one in R.

There will also be two open-ended assignments graded by peer assessment: one in visualization, and one in which you will participate in a Kaggle competition.

Finally, there will be two optional assignments: One involving an open-ended real-world project submitted by external organizations with real needs, and one involving processing a large dataset on AWS.
+
Suggested Reading

There will be selected readings each week.

We recommend, but do not require, that students refer to the book Mining of Massive Datasets by Anand Rajaraman and Jeff Ullman

Course Fee:

Free

Attendees Rating:
Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

This course is listed under Open Source , Development & Implementations , Industry Specific Applications , Data & Information Management and Networks & IT Infrastructure Community

Analytics

SQL (Structured Query Language)

1 Review

A primer into all things related to DS

Reviewed by Praveen on 27 September 15
the curriculum of the course is well designed to cover the critical concepts related to DS, from relational databases to random forrests.

Attended this course? Write a Review

Course Fee:

Free

Attendees Rating:
Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

IT Career Development Platform

Introduction to Data Science

Coursera

Course Summary

Course Description

Course Description

Course Syllabus

Recommended Background

Course Format

Suggested Reading

Attendees Rating:

Course Type:

Course Status:

Workload:

Data science

Analytics

SQL (Structured Query Language)

Python

MapReduce

Data management

Virtual Machine (VM)

Predictive Analytics

NoSQL (Not Only SQL)

Machine Learning

1 Review

Attended this course? Write a Review

Attendees Rating:

Course Type:

Course Status:

Workload: