Making Sense of Spark Performance

A session at Spark Summit 2015

Monday 15th June, 2015

2:00pm to 2:30pm

In this talk, I'll take a deep dive into Spark's performance on two benchmarks (TPC-DS and the Big Data Benchmark from UC Berkeley) and one production workload and demonstrate that many commonly-held beliefs about performance bottlenecks do not hold. In particular, I'll demonstrate that CPU (and not I/O) is often the bottleneck, that network performance can improve job completion time by a median of at most 4%, and that the causes of most stragglers can be identified and fixed. After describing the takeaways from the workloads I studied, I'll give a brief demo of how the (open-source) tools that I developed can be used by others to understand why Spark jobs are taking longer than expected. I'll conclude by proposing changes to Spark core that, based on my performance study, could significantly improve performance. This talk is based on a research talk that I'll be giving at NSDI 2015.

About the speaker

This person is speaking at this event.
Kay Ousterhout

UC Berkeley CS PhD Student bio from Twitter

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 2:00pm2:30pm PST

Date Mon 15th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session