Sessions at Hadoop Summit 2012 about Efficiency

Your current filters are…

Wednesday 13th June 2012

  • Hadoop Distributed Filesystem reliability and durability at Facebook

    by Andrew Ryan

    The Hadoop Distributed Filesystem, or HDFS, provides the storage layer to a variety of critical services at Facebook. The HDFS Namenode is often singled out as a particularly weak aspect of the design of HDFS, because it represents a single point of failure within an otherwise redundant system. To address this weakness, Facebook has been developing a highly available Namenode, known as Avatarnode. The objective of this study was to determine how much effect Avatarnode would have on overall service reliability and durability. To analyze this, we categorized, by root cause, the last two years` of operational incidents in the Data Warehouse and Messages services at Facebook, a total of 66 incidents. We were able to show that approximately 10% of each service`s incidents would have been prevented had Avatarnode been in place. Avatarnode would have prevented none of our incidents that involved data loss, and all of the most severe data loss incidents were a result of human error or software bugs. Our conclusions is that Avatarnode will improve the reliability of services that use HDFS, but that the HDFS Namenode represents only a small portion of overall operational incidents in services that use HDFS as a storage layer.

    At 10:30am to 11:10am, Wednesday 13th June

  • Improving HBase Availability and Repair

    by Joanthan Hsieh and Jeff Bean

    Apache HBase is a rapidly-evolving random-access distributed data store built on top of Apache Hadoop’s HDFS and Apache ZooKeeper. Drawing from real-world support experiences, this talk provides administrators insight into improving HBase’s availability and recovering from situations where HBase is not available. We share tips on the common root causes of unavailability, explain how to diagnose them, and prescribe measures for ensuring maximum availability of an HBase cluster. We discuss new features that improve recovery time such as distributed log splitting as well as supportability improvements. We will also describe utilities including new failure recovery tools that we have developed and contributed that can be used to diagnose and repair rare corruption problems on live HBase systems.

    At 11:25am to 12:05pm, Wednesday 13th June

  • HDFS NameNode High Availability

    by Aaron Myers and Suresh Srinivas

    The HDFS NameNode is a robust and reliable service as seen in practice in production at Yahoo and other customers. However, the NameNode does not have automatic failover support. A hot failover solution called HA NameNode is currently under active development (HDFS-1623). This talk will cover the architecture, design and setup. We will also discuss the future direction for HA NameNode.

    At 1:30pm to 2:10pm, Wednesday 13th June

Thursday 14th June 2012

  • Operate Your Hadoop Cluster Like a High-Efficiency Data Goldmine

    by Greg Bruno

    The California Gold Rush ended in 1855, but today it feels like we are at the cusp of a new Gold Rush of sorts. Only this time the prize is Big Data, and IT departments are flocking to it seeking their fortune. Some will succeed wildly — while others will fail miserably. In this session we will describe a reference architecture for Hadoop that will take your Big Data project from proof-of-concept to full-scale deployment while avoiding the missteps and mistakes that could get in the way of your project. Our reference architecture is based on industry standard Apache Hadoop, and built on a rock-solid deployment and management infrastructure derived from the Rocks cluster management software. Greg will share his expertise in big infrastructure deployment and management, showing you how to design for deployment from day one. Let us be your guide as you explore the Big Data frontier, and we will lead you to success. With the right methods, and the right tools for the job, your Hadoop project will be pure gold.

    At 11:25am to 12:05pm, Thursday 14th June

  • BranchReduce: Distributed Brand-and-Bound on YARN

    by Josh Wills

    Branch-and-bound is a widely used technique for efficiently searching for solutions to combinatorial optimization problems. In this session, we will introduce BranchReduce, an open-source Java library for performing distributed branch-and-bound on a Hadoop cluster under YARN. Applications only need to write code that is specific to their optimization problem (namely the branching rule, the lower bound computation, and the upper bound computation), and BranchReduce handles deploying the application to the cluster, managing the execution, and periodically rebalancing the search space across the machines. We will give an overview of how BranchReduce works and then walk through an example that solves a scheduling problem with a near-linear speedup over a single machine implementation.

    At 2:25pm to 3:05pm, Thursday 14th June