Sessions at Hadoop Summit 2012 about Namespaces

Your current filters are…

Thursday 14th June 2012

  • Dynamic Namespace Partitioning with Giraffa File System

    by Plamen Jeliazkov and Konstantin Shvachko

    HDFS is based on the decoupled namespace from data architecture. Its namespace operations are performed on a designated server NameNode and data is subsequently streamed from/to data servers DataNodes. While the data layer of HDFS is highly distributed, the namespace is maintained by a single NameNode, making it a SPOF and a bottleneck for its scalability and availability. HBase is a scalable metadata store, which can be used for storing objects composing the files directly in it, but this would lack the ability to separate namespace operations from data streaming. Giraffa is an experimental file system, which uses HBase to maintain the file system namespace in a distributed way and serves data directly from DataNodes. Giraffa is built from the existing in HDFS and HBase components. Giraffa is intended to maintain very large namespaces. HBase automatically partitions large tables into horizontal slices Regions. The partitioning is dynamic, so that if a region grows too big or becomes too small the table is automatically repartitioned. The partitioning is based on row ordering. In order to optimize the access to the file system objects Giraffa preserves the locality of objects adjacent in the namespace tree. The presentation will explain the Giraffa architecture, the principles behind row key definitions for namespace partitioning, and will address the atomic rename problem.

    At 1:30pm to 2:10pm, Thursday 14th June