Sessions at Hadoop Users Group: An Evening with Hadoop with slides

Your current filters are…


  • Federated HDFS

    by Sanjay Radia

    Scalability of the NameNode has been a key struggle. Because the NameNode keeps all the namespace and block locations in memory, the size of the NameNode heap limits the number of files and also the number of blocks addressable. This also limits the total cluster storage that can be supported by the NameNode.

    Federated HDFS allows multiple independent namespaces (and NameNodes) to share the physical storage within a cluster. This is enabled by the introduction of the notion of Block pools which is analogous to LUNs in a SAN storage system.

    Coverage slide deck

  • Kafka

    by Jakob Homan

    Kafka is a distributed pub-sub system that handles streaming data and provides the ability to load data directly into Apache Hadoop. It provides a highly performant messaging system combined with an simple, extensible API. Kafka is currently in production at LinkedIn and was recently open-sourced. Learn more at http://sna-projects.com/kafka/

  • Next Generation of Hadoop MapReduce

    by Owen O'Malley

    The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application's execution. High availability, security, and improved multi-tenancy are fundamental to the new architecture. The new architecture also increases innovation, agility and hardware utilization.

    Coverage slide deck

Schedule incomplete?

Add a new session

Filter by Day

Filter by coverage

Filter by Topic