Sessions at OSCON Data 2011 about Hadoop on Monday 25th July

Your current filters are…

Clear
  • Introduction to Hadoop

    by Tom Hanlon

    Hadoop gives you the ability to process massive amounts of data at scale. This presentation will show you how hadoop makes use of commodity hardware to allow you to build a system that scales, that deals gracefully with failure of individual nodes, and gives you the power of Map/Reduce to process Petabytes.

    At 10:40am to 11:20am, Monday 25th July

    In C123, Oregon Convention Center

  • Playful Explorations of Public and Personal Data

    by Andrew Turner

    It’s easy to find and create data. But what are you going to do with it? Can I ask the world complex questions such as what’s the local crime rate, distance to metro, or rating of my local school? Can you combine these all together to rate houses you may want to buy? And how do you then connect back to your government and local businesses to engage in collaborative decision making.

    This talk with discuss how you should consider users and their personal interactions with data and information. We’ll also peel back the covers on how open source tools such as HBase, Cascading, Geos and Polymaps handle analyzing and streaming realtime data to maps and visualizations both on the web and to mobile devices.

    To illustrate what’s possible, we’ll dive through GeoCommons, a large online community of data sharing and community analytics that uses open source mapping visualization, Hadoop analysis, and mobile interfaces to provide this to the world. Users can even build and socialize their own analysis methods to share their expert knowledge with other users. We’ll also review how global organizations like the World Bank and United Nations are using these tools to connect with citizens in developing countries to empower them to make decisions on building investment and understanding how climate science may affect their areas.

    At 10:40am to 11:20am, Monday 25th July

    In C124, Oregon Convention Center

  • Developing and Deploying Hadoop Security

    by Owen O'Malley

    Adding security to an existing product is never easy, but our team at Yahoo added strong authentication to Apache Hadoop by integrating it with Kerberos. This project was delivered on time and is currently deployed on all of Yahoo's 40,000 Hadoop computers. Come learn how we added security to and why it matters.

    At 11:30am to 12:10pm, Monday 25th July

    In C124, Oregon Convention Center

    Coverage video

  • Hadoop - Enterprise Data Warehouse Data Flow Analysis and Optimization

    by Aurelian Dumitru

    In this session Dell will discuss the analysis of the data types suitable for transfer between Hadoop and EDW, EDW/Hadoop data lifecycle, Data governance between Hadoop and DBMS, and ETL performance tuning and best practices (i.e. Hadoop/DBMS connector, node and network designs, etc.)

    At 11:30am to 12:10pm, Monday 25th July

    In C125/126, Oregon Convention Center

  • DataStax’ Brisk – A More Powerful, Real-time, And Easier To Deploy Hadoop, Powered By Apache Cassandra

    by Jonathan Ellis

    Brisk is an open-source Hadoop and Hive distro that utilizes Cassandra for its core services. Brisk provides integrated Hadoop MapReduce, Hive and job and task tracking, while providing an HDFS-compatible storage layer powered by Cassandra. By accelerating the time between data creation and analysis with DataStax’ Brisk, users experience greater reliability, simpler deployment and lower TCO.

    At 1:30pm to 2:10pm, Monday 25th July

    In C125/126, Oregon Convention Center

  • Ephemeral Hadoop Clusters in the Cloud

    by Greg Fodor

    The data & analytics teams at Etsy build up and tear down more than a thousand independent Hadoop clusters on EC2 each month. This talk discusses the benefits of this approach, where Elastic Map Reduce serves as a "meta-cluster" in which on-demand Hadoop clusters can be created, used, and shut down quickly and easily.

    At 1:30pm to 2:10pm, Monday 25th July

    In C121/122, Oregon Convention Center

    Coverage video

  • YARN - Next Generation Hadoop Map-Reduce

    by Arun C Murthy

    YARN is the next generation of Hadoop Map-Reduce designed to scale out much further while allowing for running applications other than pure Map-Reduce in a highly fault-tolerant manner.

    At 2:20pm to 3:00pm, Monday 25th July

    In C124, Oregon Convention Center

  • Distributed Data Analysis with Hadoop and R

    by Jonathan Seidman and Ramesh Venkatar

    An overview of the state of the art for bringing together the analytical power of the R language with the big data capabilities of Hadoop.

    At 3:30pm to 4:10pm, Monday 25th July

    In C123, Oregon Convention Center

    Coverage slide deck

  • Real-time Streaming Analysis for Hadoop and Flume

    by Aaron Kimball

    This talk introduces an open-source SQL-based system for continuous or ad-hoc analysis of streaming data built on top of Flume-based data collection for Hadoop. Attendees will understand how to use a new tool to extend their Hadoop data collection pipeline with real-time streaming analytics.

    At 3:30pm to 4:10pm, Monday 25th July

    In C124, Oregon Convention Center

    Coverage video

  • Whirr: Open Source Cloud Services

    by Tom White

    Apache Whirr is a way to run distributed systems - such as Hadoop, HBase, Cassandra, and ZooKeeper - in the cloud. Whirr provides a simple API for starting and stopping clusters for evaluation, test, or production purposes. This talk explains Whirr's architecture and shows how to use it.

    At 3:30pm to 4:10pm, Monday 25th July

    In B118-119, Oregon Convention Center

    Coverage video