•  

Trecul: Data Flow Processing Using LLVM-based JIT Compilation on Top of Hadoop

A session at Strata New York 2012

  • David Blair

Wednesday 24th October, 2012

4:10pm to 4:50pm (EST)

Akamai’s Advertising team needed a data processing infrastructure to support our production workloads for which the existing solutions fell short. Existing frameworks such as Hive and Pig demonstrate excellent scalability, ease of use for flexible query development, and fault tolerance but are generally recognized to be slow; order of magnitude slow-downs relative to parallel databases are commonly documented.

After much evaluation, Akamai implemented Trecul, a system that runs inside Hadoop. It leverages LLVM to perform JIT-compilation on top of highly optimized standard data processing operators – no Java in tight loops, no interpreter involved in predicate evaluation, just straight native code executing at line speed nestled in the scale and fault tolerance we’ve come to know and love from Hadoop’s MapReduce execution model. On our standard workloads it has 10x the throughput of Hive.

Trecul is in production today, handling billions of events an hour, powering Akamai’s Advertising systems, including our attribution engine, machine learning based-modeling, and large scale reporting and insights.

Akamai has open-sourced Trecul on Github so that it may be used by others that wish to leverage Hadoop for analytical workloads in performance critical environments.

In this talk, we will walk through the use cases that lead us to write our own processing system, review the highlights of the implementation including why JIT-compilation via LLVM inside Hadoop is great for performance, show some performance benchmarks on real-world data at scale, and discuss how others might leverage this system for their own needs.

About the speaker

This person is speaking at this event.
David Blair

Principal Software Engineer, Akamai Technologies

Sign in to add slides, notes or videos to this session

Tell your friends!

When

Time 4:10pm4:50pm EST

Date Wed 24th October 2012

Short URL

lanyrd.com/sydmp

View the schedule

Share

See something wrong?

Report an issue with this session