Your current filters are…
by Andrew Ryan
The Hadoop Distributed Filesystem, or HDFS, provides the storage layer to a variety of critical services at Facebook. The HDFS Namenode is often singled out as a particularly weak aspect of the design of HDFS, because it represents a single point of failure within an otherwise redundant system. To address this weakness, Facebook has been developing a highly available Namenode, known as Avatarnode. The objective of this study was to determine how much effect Avatarnode would have on overall service reliability and durability. To analyze this, we categorized, by root cause, the last two years` of operational incidents in the Data Warehouse and Messages services at Facebook, a total of 66 incidents. We were able to show that approximately 10% of each service`s incidents would have been prevented had Avatarnode been in place. Avatarnode would have prevented none of our incidents that involved data loss, and all of the most severe data loss incidents were a result of human error or software bugs. Our conclusions is that Avatarnode will improve the reliability of services that use HDFS, but that the HDFS Namenode represents only a small portion of overall operational incidents in services that use HDFS as a storage layer.
by Joanthan Hsieh and Jeff Bean
Apache HBase is a rapidly-evolving random-access distributed data store built on top of Apache Hadoop’s HDFS and Apache ZooKeeper. Drawing from real-world support experiences, this talk provides administrators insight into improving HBase’s availability and recovering from situations where HBase is not available. We share tips on the common root causes of unavailability, explain how to diagnose them, and prescribe measures for ensuring maximum availability of an HBase cluster. We discuss new features that improve recovery time such as distributed log splitting as well as supportability improvements. We will also describe utilities including new failure recovery tools that we have developed and contributed that can be used to diagnose and repair rare corruption problems on live HBase systems.
The HDFS NameNode is a robust and reliable service as seen in practice in production at Yahoo and other customers. However, the NameNode does not have automatic failover support. A hot failover solution called HA NameNode is currently under active development (HDFS-1623). This talk will cover the architecture, design and setup. We will also discuss the future direction for HA NameNode.
by Travis Dawson
This session shows how Hadoop enables deep analytics over massive amounts of network data, and how to extract information and value using Hadoop at the core of a complete analytics system. Narus, a division of Boeing, helps customers unlock the value of their networks with dynamic network traffic intelligence and analysis of information on IP traffic and flow data. This session provides details on how real-time traffic capture and analysis integrates with Hadoop to perform extremely complex analytics over vast quantities of data in a demanding environment to produce actionable information. The uses for these analytics range from simple network analysis to providing complex security detection and mitigation analysis. Terabytes of forensic data of network traffic are processed to isolate suspicious patterns of behavior, allowing further analysis to pinpoint malicious traffic and operators to take action.
Zions Banks’ Hadoop based security data warehouse is a massive minable database used to aggregate event data across their entire enterprise; for long term large-scale security, fraud and forensic related analytics. The utility of this system is realized once the data is normalized into a common format and mined by experts with intimate understanding of the data itself. By using the SDW powered by Hadoop the seasoned IT professional can now get creative on new ways to look at their data, there are no more limits to what can be explored in their enterprise, no data source is off limits due to the expense. Historically, doing this type of storage and analysis would be impossible due to cost and resource constraints, those handcuffs are gone now. Tackling big data can be intimidating but the rewards are worth the work. The data deluge is never going to slow down and getting your arms around your enterprise data and how that data can be inspected and analyzed delivers specific answers to many of the common threats we see today. APT’s, Malware, exploited tokens, all leave a trace, now you just need to start asking the questions.
13th–14th June 2012