Conquering Big Data with Spark and BDAS

A session at 11th International Conference on Autonomic Computing (ICAC '14)

Friday 20th June, 2014

9:00am to 10:30am (EST)

Today, big and small organizations alike collect huge amounts of data, and they do so with one goal in mind: extract "value" through sophisticated exploratory analysis, and use it as the basis to make decisions as varied as personalized treatment and ad targeting. Unfortunately, existing data analytics tools are slow in answering queries, as they typically require to sift through huge amounts of data stored on disk, and are even less suitable for complex computations, such as machine learning algorithms. These limitations leave the potential of extracting value of big data unfulfilled.

To address this challenge, we are developing Berkeley Data Analytics Stack (BDAS), an open source data analytics stack that provides interactive response times for complex computations on massive data. To achieve this goal, BDAS supports efficient, large-scale in-memory data processing, and allows users and applications to trade between query accuracy, time, and cost. In this talk, I'll present the architecture, challenges, results, and our experience with developing BDAS, with a focus on Apache Spark, an in-memory cluster computing engine that provides support for a variety of workloads, including batch, streaming, and iterative computations. In a relatively short time, Spark has become the most active big data project in the open source community, and is already being used by over one hundred of companies and research institutions.

About the speaker

This person is speaking at this event.
Ion Stoica

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 9:00am10:30am EST

Date Fri 20th June 2014

Session Hash Tag


Short URL


View the schedule


See something wrong?

Report an issue with this session