Wednesday 24th October, 2012
4:10pm to 4:50pm
Akamai’s Advertising team needed a data processing infrastructure to support our production workloads for which the existing solutions fell short. Existing frameworks such as Hive and Pig demonstrate excellent scalability, ease of use for flexible query development, and fault tolerance but are generally recognized to be slow; order of magnitude slow-downs relative to parallel databases are commonly documented.
After much evaluation, Akamai implemented Trecul, a system that runs inside Hadoop. It leverages LLVM to perform JIT-compilation on top of highly optimized standard data processing operators – no Java in tight loops, no interpreter involved in predicate evaluation, just straight native code executing at line speed nestled in the scale and fault tolerance we’ve come to know and love from Hadoop’s MapReduce execution model. On our standard workloads it has 10x the throughput of Hive.
Trecul is in production today, handling billions of events an hour, powering Akamai’s Advertising systems, including our attribution engine, machine learning based-modeling, and large scale reporting and insights.
Akamai has open-sourced Trecul on Github so that it may be used by others that wish to leverage Hadoop for analytical workloads in performance critical environments.
In this talk, we will walk through the use cases that lead us to write our own processing system, review the highlights of the implementation including why JIT-compilation via LLVM inside Hadoop is great for performance, show some performance benchmarks on real-world data at scale, and discuss how others might leverage this system for their own needs.
Principal Software Engineer, Akamai Technologies
Sign in to add slides, notes or videos to this session