•  

Dynamic Namespace Partitioning with Giraffa File System

A session at Hadoop Summit 2012

  • Plamen Jeliazkov
  • Konstantin Shvachko

Thursday 14th June, 2012

1:30pm to 2:10pm (PST)

HDFS is based on the decoupled namespace from data architecture. Its namespace operations are performed on a designated server NameNode and data is subsequently streamed from/to data servers DataNodes. While the data layer of HDFS is highly distributed, the namespace is maintained by a single NameNode, making it a SPOF and a bottleneck for its scalability and availability. HBase is a scalable metadata store, which can be used for storing objects composing the files directly in it, but this would lack the ability to separate namespace operations from data streaming. Giraffa is an experimental file system, which uses HBase to maintain the file system namespace in a distributed way and serves data directly from DataNodes. Giraffa is built from the existing in HDFS and HBase components. Giraffa is intended to maintain very large namespaces. HBase automatically partitions large tables into horizontal slices Regions. The partitioning is dynamic, so that if a region grows too big or becomes too small the table is automatically repartitioned. The partitioning is based on row ordering. In order to optimize the access to the file system objects Giraffa preserves the locality of objects adjacent in the namespace tree. The presentation will explain the Giraffa architecture, the principles behind row key definitions for namespace partitioning, and will address the atomic rename problem.

About the speakers

This person is speaking at this event.
Plamen Jeliazkov

Student — UC San Diego

Plamen Jeliazkov is a student at UCSD Compuer Science Department graduating in summer 2012. He was an intern at eBay in summer 2011 working on HDFS and implementing the prototype of Giraffa File System

This person is speaking at this event.
Konstantin Shvachko

Principal Hadoop Architect — eBay

Konstantin is a veteran Hadoop developer. He is a principal Hadoop architect at eBay. Konstantin specializes in efficient data structures and algo_rithms for large-scale distributed storage systems. Konstan_tin holds a Ph.D. in computer science from Moscow State University, Russia. He is a member of the Apache Hadoop Project Management Committee.

Sign in to add slides, notes or videos to this session

Tell your friends!

When

Time 1:30pm2:10pm PST

Date Thu 14th June 2012

Short URL

lanyrd.com/stttw

Official event site

hadoopsummit.org

View the schedule

Share

Topics

See something wrong?

Report an issue with this session