A Data Frame Abstraction Layer for SparkR

A session at Spark Summit 2015

  • Shivaram Venkataraman

Tuesday 16th June, 2015

1:30pm to 2:00pm

The data frame is a fundamental construct in R programming and is one of the primary reasons why R has become such a popular language for data analysis. In Spark 1.3, SparkSQL received its own implementation of the data frame concept which extends the SchemaRDD with additional methods that enable more intuitive data manipulation. The similarities between R's data frame and its analogue in SparkSQL represent a clear opportunity for development within the SparkR project. A natural, and proven, path forward is to create R bindings to the existing Spark DataFrame API and extend it to work with R's traditional data frame syntax. With DataFrame support, we expose new functionality that allows R programmers to quickly become proficient at working with large, distributed datasets in Spark . In this talk, we describe how this was accomplished and provide a demonstration of the enhanced functionality in SparkR.

About the speakers

This person is speaking at this event.
Shivaram Venkataraman
This person is speaking at this event.
Chris Freeman

Senior Content Engineer at Alteryx, Inc. bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 1:30pm2:00pm PST

Date Tue 16th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session