Sessions at Strata 2012 about Hive on Tuesday 28th February

Your current filters are…

Clear
  • Hadoop Data Warehousing with Hive

    by Dean Wampler and Jason Rutherglen

    In this hands-on tutorial, you’ll learn how to install and use Hive for Hadoop-based data warehousing. You’ll also learn some tricks of the trade and how to handle known issues.

    Using the Hive Tutorial Tools

    We’ll email instructions to you before the tutorial so you can come prepared with the necessary tools installed and ready to go. This prior preparation will let us use the whole tutorial time to learn Hive’s query language and other important topics. At the beginning of the tutorial we’ll show you how to use these tools.

    Writing Hive Queries

    We’ll spend most of the tutorial using a series of hands-on exercises with actual Hive queries, so you can learn by doing. We’ll go over all the main features of Hive’s query language, HiveQL, and how Hive works with data in Hadoop.

    Advanced Techniques

    Hive is very flexible about the formats of data files, the “schema” of records and so forth. We’ll discuss options for customizing these and other aspects of your Hive and data cluster setup. We’ll briefly examine how you can write Java user defined functions (UDFs) and other plugins that extend Hive for data formats that aren’t supported natively.

    Hive in the Hadoop Ecosystem

    We’ll conclude with a discussion of Hive’s place in the Hadoop ecosystem, such as how it compares to other available tools. We’ll discuss installation and configuration issues that ensure the best performance and ease of use in a real production cluster. In particular, we’ll discuss how to create Hive’s separate “metadata” store in a traditional relational database, such as MySQL. We’ll offer tips on data formats and layouts that improve performance in various scenarios.

    At 9:00am to 12:30pm, Tuesday 28th February

    Coverage slide deck

  • Introduction to Apache Hadoop

    by Sarah Sproehnle

    This tutorial provides a solid foundation for those seeking to understand large scale data processing with MapReduce and Hadoop, plus its associated ecosystem. This session is intended for those who are new to Hadoop and are seeking to understand where Hadoop is appropriate and how it fits with existing systems.

    The agenda will include:

    • The rationale for Hadoop
    • Understanding the Hadoop Distributed File System (HDFS) and MapReduce
    • Common Hadoop use cases including recommendation engines, ETL, time-series analysis and more
    • How Hadoop integrates with other systems like Relational Databases and Data Warehouses
    • Overview of the other components in a typical Hadoop “stack” such as these Apache projects: Hive, Pig, HBase, Sqoop, Flume and Oozie

    At 9:00am to 12:30pm, Tuesday 28th February

    In Ballroom CD, Santa Clara Convention Center

  • Developing applications for Apache Hadoop

    by Sarah Sproehnle

    This tutorial will explain how to leverage a Hadoop cluster to do data analysis using Java MapReduce, Apache Hive and Apache Pig. It is recommended that participants have experience with some programming language. Topics include:

    • Why are Hadoop and MapReduce needed?
    • Writing a Java MapReduce program
    • Common algorithms applied to Hadoop such as indexing, classification, joining data sets and graph processing
    • Data analysis with Hive and Pig
    • Overview of writing applications that use Apache HBase

    At 1:30pm to 5:00pm, Tuesday 28th February

    In GA J, Santa Clara Convention Center