Kafka is a distributed publish-subscribe messaging system aimed at providing a scalable, high-throughput, low latency solution for log aggregation and activity stream processing for LinkedIn. Built on Apache Zookeeper in Scala, Kafka aims at providing a unified stream for both real-time and offline consumption. We provide a mechanism for parallel data load into Hadoop as well as the ability to partition real-time consumption over a cluster of machines. Kafka combines the benefits of traditional log aggregators and messaging systems and has been used successfully in production for 8 months. It provides API similar to that of a messaging system and allows applications to consume log events in real-time. Written by the SNA team at LinkedIn, Kafka is open sourced under the Apache 2.0 License and preparing to be submitted as an Apache incubator project. In this presentation, we will highlight the core design principles for this system, and how this system fits into LinkedIn's data ecosystem as well as some of the products and monitoring applications it supports in our usage.
Search, Distributed Systems, Stream data processing, LinkedIn bio from Twitter
Sign in to add slides, notes or videos to this session