Large Scale ETL with Hadoop

A session at Strata New York 2012

Wednesday 24th October, 2012

11:40am to 12:20pm (EST)

Hadoop is commonly used for processing large swaths of data in batch. While many of the necessary building blocks for data processing exist within the Hadoop ecosystem – HDFS, MapReduce, HBase, Hive, Pig, Oozie, and so on – it can be a challenge to assemble and operationalize them as a production ETL platform. This presentation covers one approach to data ingest, organization, format selection, process orchestration, and external system integration, based on collective experience acquired across many production Hadoop deployments.

About the speaker

This person is speaking at this event.
Eric Sammer

engineer @cloudera, #flume committer, distributed systems / data / hadoop. author of hadoop operations from o'reilly. bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 11:40am12:20pm EST

Date Wed 24th October 2012

Short URL


View the schedule


See something wrong?

Report an issue with this session