by Jakob Homan
Kafka is a distributed pub-sub system that handles streaming data and provides the ability to load data directly into Apache Hadoop. It provides a highly performant messaging system combined with an simple, extensible API. Kafka is currently in production at LinkedIn and was recently open-sourced. Learn more at http://sna-projects.com/kafka/
The Apache Hadoop MapReduce framework has hit a scalability limit around 4,000 machines. We are developing the next generation of Hadoop MapReduce that factors the framework into a generic resource scheduler and a per-job, user-defined component that manages the application's execution. High availability, security, and improved multi-tenancy are fundamental to the new architecture. The new architecture also increases innovation, agility and hardware utilization.
23rd March 2011