Building a Data Warehouse for Business Analytics using Spark SQL

A session at Spark Summit 2015

Monday 15th June, 2015

4:30pm to 5:00pm

Edmunds.com is a car-shopping website that serves nearly 18 million visitors each month, and we heavily use data analysis to optimize the experience for each visitor. To accomplish that goal, the engineering team at Edmunds processes terabytes of data, and our business analysts use rich visualizations on traffic, revenue and car leads metrics to get insights on the car shopper journey. When our team was faced with the challenge of increasing the speed of the pipeline and empowering business analysts to be completely self-autonomous in the process of dataset creation, aggregation and visualization, we decided to use Apache Spark. This talk is about that migration process and bumps along the road. First, the talk will address the technical hurdles we had to clear bringing up Spark - including the process of exposing our data in S3 for productionalized ETL and Ad Hoc analysis using Spark SQL in combination with libraries that we built in Scala. Then, we cover the benefits we were able to achieve - better data refresh intervals, faster queries times, and even increased productivity in our development process. Lastly, we cover the rich set of visualization and analysis tools we employ to make all these data marts easily accessible to our business analysts.

About the speaker

This person is speaking at this event.
Blagoy Kaloferov

Software Engineer at Edmunds.com bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 4:30pm5:00pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session