Massive growth in the size of business datasets leads many companies to Hadoop, an emerging architecture for parallel data processing. However, the migration path can be challenging, in part because MapReduce analyses use programming languages like Java and Python rather than SQL. Apache Pig is a high-level framework built on top of Hadoop that offers a powerful yet vastly simplified way to analyze data in Hadoop. It allows businesses to leverage the power of Hadoop in a simple language readily learnable by anyone that understands SQL. In this presentation, I will introduce Pig and show how it's been used at Twitter to solve numerous analytics challenges that became intractable with our former MySQL-based architecture.
VP of Product for Revenue at Twitter. Former big data engineer. Digital advertising, ultra-marathons, particle physics, diet mountain dew, hadoop, lolcats. bio from Twitter
Sign in to add slides, notes or videos to this session