Tuesday 23rd October, 2012
1:30pm to 5:00pm
Implementing Map/Reduce applications using tools like Java can be hard; as a result, it is often useful to be able to use Map/Reduce from other languages. In this tutorial, we’ll provide an introduction to RHadoop, an open source Map/Reduce library for R. We will assume that attendees have a broad familiarity with R and Hadoop, however the exercises do not require attendees to be an expert in either platform.
First, we will discuss the basics of Map/Reduce, a framework for writing massively parallel big data analytics, and the nuances of the RHadoop implementation.
Next, we’ll discuss some common techniques in RHadoop including maintaining application state, processing data that has a Zipfian distribution, representing distributed matrices, performing basic operations over distributed matrices, finding outliers, and debugging.
Finally, we’ll walk through an interactive exercise to show attendees how to create a trending topic analysis using LDA and RHadoop. First, we’ll show attendees how to install both Hadoop and the rmr package, which provides Map/Reduce functionality. Then we’ll walk through an interactive coding example that demonstrates how to actually use RHadoop to create a sliding window analysis of trending topics.
Cloud Analytics Engineer, Booz Allen Hamilton
Sign in to add slides, notes or videos to this session