Normalizing, Tagging and Processing data in real-time Using Spark Streaming

A session at Spark Summit 2015

Tuesday 16th June, 2015

4:30pm to 5:00pm

Apache Spark is a flexible, scalable and fault-tolerant data processing framework that specializes in processing large amount of data. Spark Streaming builds on top of the core library to consume data from ingest systems like Apache Kafka, Apache Flume, Amazon Kinesis etc., in real time. In this talk, we will talk about the recent advances in Spark Streaming - the design of several new features that have improved performance and eliminated any possibility of data loss. We will discuss the use of Spark Streaming at Salesforce.com to normalize data coming in from a variety of sources in real-time and how this normalized data is then tagged and made available to downstream applications for consumption. We will discuss the integration of Spark Streaming with Kafka in both directions and how such an integration is important for this use-case.

About the speakers

This person is speaking at this event.
Hari Shreedharan
This person is speaking at this event.
Siddhartha Jain

Information Security Director at Salesforce bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 4:30pm5:00pm PST

Date Tue 16th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session