Tuesday 3rd June, 2014
11:15am to 11:55am
Currently users of the Apache Falcon system are forced to define their application as Apache Oozie workflow. While users are hidden from the scheduler and its working via Falcon, they still end up learning about Oozie as the application are to be defined as Oozie workflows. Objective of this is to provide a pipeline designer user interface through which users can author their processes and provision them on Falcon. This should make building applications on Falcon over Hadoop fairly trivial. Falcon has the ability to operate with HCatalog tables natively. This means that there is a one to one correspondence between a Falcon feed and an HCatalog table. Between the feed definition in Falcon and the underlying table definition in HCatalog, there is adequate metadata about the data stored underneath. This data (sets of them) can then be operated over by a collection of transformations to extract more refined dataset/feed. This logic (currently represented via Oozie workflow / pig scripts / map-reduce jobs) is typically represented through the Falcon process. In this talk we walk through the details of the pipeline designer and the current state of this feature.
Sign in to add slides, notes or videos to this session