•  

Hadoop Data Warehousing with Hive

A session at Strata 2012

Tuesday 28th February, 2012

9:00am to 12:30pm (PST)

In this hands-on tutorial, you’ll learn how to install and use Hive for Hadoop-based data warehousing. You’ll also learn some tricks of the trade and how to handle known issues.

Using the Hive Tutorial Tools

We’ll email instructions to you before the tutorial so you can come prepared with the necessary tools installed and ready to go. This prior preparation will let us use the whole tutorial time to learn Hive’s query language and other important topics. At the beginning of the tutorial we’ll show you how to use these tools.

Writing Hive Queries

We’ll spend most of the tutorial using a series of hands-on exercises with actual Hive queries, so you can learn by doing. We’ll go over all the main features of Hive’s query language, HiveQL, and how Hive works with data in Hadoop.

Advanced Techniques

Hive is very flexible about the formats of data files, the “schema” of records and so forth. We’ll discuss options for customizing these and other aspects of your Hive and data cluster setup. We’ll briefly examine how you can write Java user defined functions (UDFs) and other plugins that extend Hive for data formats that aren’t supported natively.

Hive in the Hadoop Ecosystem

We’ll conclude with a discussion of Hive’s place in the Hadoop ecosystem, such as how it compares to other available tools. We’ll discuss installation and configuration issues that ensure the best performance and ease of use in a real production cluster. In particular, we’ll discuss how to create Hive’s separate “metadata” store in a traditional relational database, such as MySQL. We’ll offer tips on data formats and layouts that improve performance in various scenarios.

About the speakers

This person is speaking at this event.
Dean Wampler

Think Big Analytics

This person is speaking at this event.
Jason Rutherglen

Think Big Analytics

Coverage of this session

Sign in to add slides, notes or videos to this session

Strata 2012

United States United States, Santa Clara

28th February to 1st March 2012

Tell your friends!

When

Time 9:00am12:30pm PST

Date Tue 28th February 2012

Short URL

lanyrd.com/smmtr

View the schedule

Share

Topics

Books by speaker

  • Programming Scala: Scalability = Functional Programming + Objects
  • Programming Hive
  • Functional Programming for Java Developers: Tools for Better Concurrency, Abstraction, and Agility

See something wrong?

Report an issue with this session