Taming GC Pauses for Humongous Java Heaps in Spark Graph Computing

A session at Spark Summit 2015

Tuesday 16th June, 2015

3:00pm to 3:30pm

Spark nodes are shifting from commodity hardware to more powerful systems with higher memory environments (200GB+). As an in-memory computing framework, popular wisdom has it that large Java heaps result in long garbage collection pauses slowing down Spark's overall throughput. Through several case studies using large Java heaps, we will show it is possible to maintain low GC pauses for better application throughput. In this presentation, we introduce the Hotspot G1 collector as the best GC for Spark solutions running in large memory environments. We first discuss Hotspot G1 internal operations and several tuning flags. Those flags can be used to set desired GC pause target, change adaptive GC thresholds, and adjust GC activities at runtime. We will provide several case studies from Spark graph computing application running 80GB+ heap to show how we can tune those flags to remove unpredicted and protracted GC pauses for better application throughput.

About the speakers

This person is speaking at this event.
Eric. P Kaczmarek

Hadoop Performance Engineer bio from LinkedIn

This person is speaking at this event.
Liqi Yi

-- bio from LinkedIn

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 3:00pm3:30pm PST

Date Tue 16th June 2015

Short URL


Official event site


View the schedule


See something wrong?

Report an issue with this session