BlinkDB and G-OLA: Supporting Continuous Answers with Error Bars in SparkSQL

A session at Spark Summit 2015

Monday 15th June, 2015

4:30pm to 5:00pm

Big datasets are growing exponentially, but our needs to get quick interactive responses to our queries remain ever as important. This talk will feature an overview of various components in BlinkDB and introduce a new generalized online aggregation (G-OLA) paradigm in SparkSQL to incrementally process massive amounts of data on clusters of tens, hundreds or thousands of machines while returning approximate answers. More precisely, this new execution model enables SparkSQL to present the user with meaningful approximate results (with error bars) that are continuously refined and updated, at a speed comfortable to the user, while it crunches larger and larger fractions of the whole dataset in the background. This not only alleviates the need for pre-processing the data in advance for a wide range of queries, but also enables the users to observe the progress of a query and control its execution on the fly-- enabling a smooth time/accuracy trade-off.

About the speakers

This person is speaking at this event.
Sameer Agarwal
This person is speaking at this event.
Kai Zeng

Postdoc at AMPLab, University of California, Berkeley bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 4:30pm5:00pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session