Sessions at Chicago Data Summit about Hadoop with video

Your current filters are…

Clear

Tuesday 26th April 2011

  • Data Processing with Hadoop: Scalable and Cost Effective

    by Doug Cutting

    Hadoop is a new paradigm for data processing that scales near linearly to petabytes of data. Commodity hardware running open source software provides unprecedented cost effectiveness. It is affordable to save large, raw datasets, unfiltered, in Hadoop's file system. Together with Hadoop's computational power, this facilitates operations such as ad hoc analysis and retroactive schema changes. An extensive open source tool-set is being built around these capabilities, making it easy to integrate Hadoop into many new application areas.

    At 1:45pm to 2:40pm, Tuesday 26th April

  • Extending the Enterprise Data Warehouse with Hadoop

    by Jonathan Seidman and Rob Lancaster

    Hadoop provides the ability to extract business intelligence from extremely large, heterogeneous data sets that were previously impractical to store and process in traditional data warehouses. The challenge for many companies now is in bridging the gap between the data in the data warehouse and the data in Hadoop. In this talk we'll discuss some steps that Orbitz has taken to bridge this gap, including examples of how Hadoop and Hive are used to aggregate data from large data sets, and how that data can be combined with relational data to create new reports that provide actionable intelligence to business users.

    At 2:45pm to 3:30pm, Tuesday 26th April

  • Flume: An Introduction

    by Jonathan Hsieh

    Flume is an open-source, distributed, streaming log collection system designed for ingesting large quantities of data into large-scale data storage and analytics platforms such as Apache Hadoop. It has four goals in mind: Reliability, Scalability, Extensibility, and Manageability. Its horizontal scalable architecture offers fault-tolerant end-to-end delivery guarantees, support for low-latency event processing, provides a centralized management interface , and exposes metrics for ingest monitoring and reporting. It natively supports writing data to Hadoop's HDFS but also has a simple extension interface that allows it to write to other scalable data systems such as low-latency datastores or incremental search indexers.

    At 3:45pm to 4:30pm, Tuesday 26th April

  • Geo-based Content Processing Using HBase

    by Ravi Veeramachaneni

    NAVTEQ uses Cloudera Distribution including Apache Hadoop (CDH) and HBase with Cloudera Enterprise support to process and store location content data. With HBase and its distributed and column-oriented architecture, NAVTEQ is able to process large amounts of data in a scalable and cost-effective way.

    At 3:45pm to 4:30pm, Tuesday 26th April

  • Cloudera's Distribution including Apache Hadoop & Cloudera Enterprise

    by Charles Zedlewski

    This session will discuss what's new in the recently released CDH3 and Enterprise 3.5 products. We'll review how usage of Hadoop has evolving in the enterprise and how CDH3 and Enterprise 3.5 meet these new challenges with advances in functionality, performance, security and manageability.

    At 4:35pm to 5:15pm, Tuesday 26th April

Schedule incomplete?

Add a new session

Filter by Day

Filter by coverage

Filter by Topic