Architectural Considerations for Hadoop Applications

A session at Strata + Hadoop World 2014

Wednesday 15th October, 2014

9:00am to 12:30pm (EST)

Implementing solutions with Apache Hadoop requires understanding not just Hadoop, but a broad range of related projects in the Hadoop ecosystem such as Hive, Pig, Oozie, Sqoop, and Flume. The good news is that there’s an abundance of materials – books, web sites, conferences, etc. – for gaining a deep understanding of Hadoop and these related projects. The bad news is there’s still a scarcity of information on how to integrate these components to implement complete solutions. In this tutorial we’ll walk through an end-to-end case study of a clickstream analytics engine to provide a concrete example of how to architect and implement a complete solution with Hadoop. We’ll use this example to illustrate important topics such as:
* Modeling data in Hadoop
* Selecting optimal storage formats for data stored in Hadoop
* Moving data between Hadoop and external data management systems such as relational databases
* Moving event-based data such as logs and machine generated data into Hadoop
* Accessing and processing data in Hadoop
* Orchestrating and scheduling workflows on Hadoop

Throughout the example, best practices and considerations for architecting applications on Hadoop will be covered. This tutorial will be valuable for developers, architects, or project leads who are already knowledgeable about Hadoop and are now looking for more insight into how it can be leveraged to implement real-world applications.

About the speakers

This person is speaking at this event.
Mark Grover

@cloudera engineer in the bay area. Co-author of @hadooparchbook. bio from Twitter

This person is speaking at this event.
Gwen (Chen) Shapira

Database consultant, Oracle ACE Director, Oak Table member, loves all things data related, rides bikes, lives in SF bay area, drinks single malts. bio from Twitter

This person is speaking at this event.
Ted Malaska
This person is speaking at this event.
Jonathan Seidman

Husband. Father. Pet-wrangler. Software Engineer at Cloudera. Co-author of Hadoop Application Architectures for O'Reilly bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 9:00am12:30pm EST

Date Wed 15th October 2014

Session Hash Tag


Short URL


View the schedule


See something wrong?

Report an issue with this session