Crunching Big Data with R and Hadoop

A session at Strata New York 2012

  • Stephanie Beben

Tuesday 23rd October, 2012

1:30pm to 5:00pm (EST)

Implementing Map/Reduce applications using tools like Java can be hard; as a result, it is often useful to be able to use Map/Reduce from other languages. In this tutorial, we’ll provide an introduction to RHadoop, an open source Map/Reduce library for R. We will assume that attendees have a broad familiarity with R and Hadoop, however the exercises do not require attendees to be an expert in either platform.

First, we will discuss the basics of Map/Reduce, a framework for writing massively parallel big data analytics, and the nuances of the RHadoop implementation.

Next, we’ll discuss some common techniques in RHadoop including maintaining application state, processing data that has a Zipfian distribution, representing distributed matrices, performing basic operations over distributed matrices, finding outliers, and debugging.

Finally, we’ll walk through an interactive exercise to show attendees how to create a trending topic analysis using LDA and RHadoop. First, we’ll show attendees how to install both Hadoop and the rmr package, which provides Map/Reduce functionality. Then we’ll walk through an interactive coding example that demonstrates how to actually use RHadoop to create a sliding window analysis of trending topics.

About the speakers

This person is speaking at this event.
Ed Kohlwey

I writes teh software. bio from Twitter

This person is speaking at this event.
Stephanie Beben

Cloud Analytics Engineer, Booz Allen Hamilton

Sign in to add slides, notes or videos to this session

Tell your friends!


Time 1:30pm5:00pm EST

Date Tue 23rd October 2012

Short URL


View the schedule


See something wrong?

Report an issue with this session