Understanding MapReduce with Hadoop

A session at SPA 2008

With today's applications we are generating data faster than we can understand it. So having tools for processing, aggregating, and analyzing large volumes of data is vital for us to reach that level of understanding. MapReduce and Hadoop are two new tools for this purpose. MapReduce is a parallel programming model devised at Google for efficiently processing large amounts of data, and Hadoop is an Apache open-source framework for running MapReduce programs.

In this session we will look at why processing very large datasets is difficult with current tools and how MapReduce and Hadoop help. The focus of the session is to understand the constraints that the MapReduce programming model impose on writing parallel programs, and how those same constraints actually provide a useful way to look at many data processing problems. To develop this understanding a few basic MapReduce worked examples will be given and demonstrated on a running Hadoop system, then the group will be invited to work in pairs to write a MapReduce program to solve a data processing problem.

About the speaker

This person is speaking at this event.
Tom White

Hadoop book author, Apache Hadoop committer, recreational maker. bio from Twitter

Coverage of this session

Sign in to add slides, notes or videos to this session

SPA 2008

England England, Bedfordshire

16th19th March 2008

Tell your friends!

Short URL


Official event site


View the schedule



Books by speaker

  • Hadoop: The Definitive Guide
  • Hadoop: The Definitive Guide

See something wrong?

Report an issue with this session