•  

Hadoop First ETL Authoring using Apache Falcon

A session at Hadoop Summit North America

Tuesday 3rd June, 2014

11:15am to 11:55am (PST)

Currently users of the Apache Falcon system are forced to define their application as Apache Oozie workflow. While users are hidden from the scheduler and its working via Falcon, they still end up learning about Oozie as the application are to be defined as Oozie workflows. Objective of this is to provide a pipeline designer user interface through which users can author their processes and provision them on Falcon. This should make building applications on Falcon over Hadoop fairly trivial. Falcon has the ability to operate with HCatalog tables natively. This means that there is a one to one correspondence between a Falcon feed and an HCatalog table. Between the feed definition in Falcon and the underlying table definition in HCatalog, there is adequate metadata about the data stored underneath. This data (sets of them) can then be operated over by a collection of transformations to extract more refined dataset/feed. This logic (currently represented via Oozie workflow / pig scripts / map-reduce jobs) is typically represented through the Falcon process. In this talk we walk through the details of the pipeline designer and the current state of this feature.

About the speaker

This person is speaking at this event.
Srikanth Sundarrajan

Principal Architect at InMobi bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!

When

Time 11:15am11:55am PST

Date Tue 3rd June 2014

Short URL

lanyrd.com/sdbmwd

Official session page

hadoopsummit.org/…/schedule/

View the schedule

Share

See something wrong?

Report an issue with this session