Tuesday 16th June, 2015
4:30pm to 5:00pm
Apache Spark is a flexible, scalable and fault-tolerant data processing framework that specializes in processing large amount of data. Spark Streaming builds on top of the core library to consume data from ingest systems like Apache Kafka, Apache Flume, Amazon Kinesis etc., in real time. In this talk, we will talk about the recent advances in Spark Streaming - the design of several new features that have improved performance and eliminated any possibility of data loss. We will discuss the use of Spark Streaming at Salesforce.com to normalize data coming in from a variety of sources in real-time and how this normalized data is then tagged and made available to downstream applications for consumption. We will discuss the integration of Spark Streaming with Kafka in both directions and how such an integration is important for this use-case.
Sign in to add slides, notes or videos to this session