Spark DataFrames: Simple and Fast Analysis of Structured Data

A session at Spark Summit 2015

Monday 15th June, 2015

1:00pm to 1:30pm

Since its introduction one year ago, Spark SQL has proven to be a highly effective way to speed up existing SQL workloads by leveraging the power of Spark. Spark SQL's built-in support for reading data from existing Hive warehouses allows HQL users to achieve better performance simply by switching query engines. However even non-SQL workloads can often benefit from the automatic optimizations that Spark SQL can perform. At the core of Spark SQL is the notion of a DataFrame, which improves on traditional RDDs by giving them knowledge of how best to manipulate the data that they hold. In addition to rich querying, this structure makes it possible to more efficiently cache and shuffle the data during computations. Furthermore, with the addition of the data sources API, Spark SQL makes it easier to compute over structured data sourced from a

About the speaker

This person is speaking at this event.
Michael Armbrust

Lead developer of Spark SQL @databricks, formerly @ucberkeley. Distributed databases, query languages, scala, other nerdy stuff... bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 1:00pm1:30pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session