Writing Complex Analytical Queries with Hive

Pluralsight

Course Summary

Hive is a data warehouse that runs on top of the Hadoop distributed computing framework. It works on huge datasets, so this course is useful for understanding its features so you can write efficient, fast, and optimal queries.

+
Course Description

The Hive data warehouse supports analytical processing, it generally processes long-running jobs which crunch a huge amount of data. By understanding what goes on behind the scenes in Hive, you can structure your Hive queries to be optimal and performant, thus making your data analysis very efficient. In this course, Writing Complex Analytical Queries with Hive, you'll discover how to make design decisions and how to lay out data in your Hive tables. First, you'll dive into partitioning and bucketing, which are ways to reduce the data a query has to process. You'll cover how and when you use partitioning, bucketing, or both when you set up your tables. Next, you'll be introduced to the joins operation, along with covering how to deal with large tables, and run and optimize map-only joins. Lastly, you'll learn windowing functions, which allow you to write complex queries simply and easily with no intermediate tables. An important optimization with large datasets. By the end of this course, you'll develop an understanding for the little details that makes writing complex queries easier and faster.

Course Description

The Hive data warehouse supports analytical processing, it generally processes long-running jobs which crunch a huge amount of data. By understanding what goes on behind the scenes in Hive, you can structure your Hive queries to be optimal and performant, thus making your data analysis very efficient. In this course, Writing Complex Analytical Queries with Hive, you'll discover how to make design decisions and how to lay out data in your Hive tables. First, you'll dive into partitioning and bucketing, which are ways to reduce the data a query has to process. You'll cover how and when you use partitioning, bucketing, or both when you set up your tables. Next, you'll be introduced to the joins operation, along with covering how to deal with large tables, and run and optimize map-only joins. Lastly, you'll learn windowing functions, which allow you to write complex queries simply and easily with no intermediate tables. An important optimization with large datasets. By the end of this course, you'll develop an understanding for the little details that makes writing complex queries easier and faster.

+
Course Syllabus

Course Overview
- 1m 53s

â€”Course Overview 1m 53s

Using Hive for Analytical Queries
- 21m 35s

â€”Introduction and Pre-requisites for This Course 1m 39s
â€”A Data Warehouse for Analytical Processing 4m 6s
â€”Hive as a Data Warehouse 3m 14s
â€”Managing Huge Datasets and Writing Faster Queries 2m 58s
â€”A Brief Introduction: Bucketing and Partitioning 3m 36s
â€”A Brief Introduction: Join Optimizations 2m 52s
â€”A Brief Introduction: Window Functions 3m 8s

Partitioning Tables for Faster Queries
- 42m 10s

â€”Partitioning: The Logical Equivalent of Indexes 3m 33s
â€”Data Organization with Partitions 5m 15s
â€”Working with a Managed Partitioned Table 6m 24s
â€”When Would You Use Partitions? 2m 12s
â€”Loading from Files into a Partitioned Table 3m 26s
â€”Partitioning an External Table 6m 45s
â€”Partitioning Trade-offs 2m 43s
â€”Introduction to Dynamic Partitioning 4m 3s
â€”Implementing Dynamic Partitioning 4m 39s
â€”Multi-column Partitioning 3m 5s

Bucketing Columns for Faster Joins
- 38m 23s

â€”Bucketing: The Logical Equivalent of Hash Tables 4m 34s
â€”The Modulo Operator as a Hashing Function 4m 32s
â€”Working with Bucketed Tables 2m 59s
â€”Bucketing vs. Partitioning 3m 23s
â€”Implementing a Partitioned, Bucketed Table 2m 39s
â€”Advantages of Bucketing 6m 53s
â€”Sorting Records Within a Bucket 2m 42s
â€”Sampling Data from a Hive Table 5m 10s
â€”Bucket Sampling on Hive Tables 5m 27s

Optimizing Hive Joins
- 47m 21s

â€”Behind the Scenes: An Introduction to MapReduce 3m 46s
â€”Optimizing Joins: Join Columns and MapReduce Jobs 2m 25s
â€”Implementing a Join Operation 3m 35s
â€”Optimizing Joins: Streaming the Largest Table 3m 23s
â€”Optimizing Joins: Bucketing and Partitioning on the Join Columns 1m 50s
â€”The Left Semi-join Operator 5m 46s
â€”Behind the Scenes: The MapReduce Data Flow 4m 1s
â€”Behind the Scenes: MapReduce for Join Operations 4m 28s
â€”Map-only Joins: The Inner Join 4m 40s
â€”Map-only Joins: The Left Outer Join 3m 4s
â€”Map-only Joins: The Right Outer Join 1m 43s
â€”Map-only Joins: The Full Outer Join 3m 6s
â€”The Bucket Map Join 5m 29s

Windowing Functions
- 31m 8s

â€”Introduction to Window Functions 4m 2s
â€”The Running Total and Running Average Implementations 5m 53s
â€”Window Functions with Partitions 6m 21s
â€”Calculating Moving Averages 2m 28s
â€”Calculating Percentage Contributions 3m 28s
â€”The Row Number and Rank Window Functions 4m 21s
â€”Calculating Quantiles 4m 33s

Course Fee:

USD 29

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

This course is listed under Development & Implementations , Industry Specific Applications , Data & Information Management and Networks & IT Infrastructure Community

Hadoop

Data Analysis

Attended this course? Write a Review

Course Fee:

USD 29

Course Type:	Self-Study
Course Status:	Active
Workload:	1 - 4 hours / week

IT Career Development Platform

Writing Complex Analytical Queries with Hive

Pluralsight

Course Summary

Course Description

Course Description

Course Syllabus

Course Type:

Course Status:

Workload:

Hive

MapReduce

Data Warehouse (DW)

Hashing

Hadoop

Data Analysis

Attended this course? Write a Review

Course Type:

Course Status:

Workload: