Get Lanyrd on your mobile (iPhone, Android and more) - check it out here

Sessions at SPA 2008 about Hadoop

Your current filters are…

Unscheduled

  • Understanding MapReduce with Hadoop

    by Tom White

    With today's applications we are generating data faster than we can understand it. So having tools for processing, aggregating, and analyzing large volumes of data is vital for us to reach that level of understanding. MapReduce and Hadoop are two new tools for this purpose. MapReduce is a parallel programming model devised at Google for efficiently processing large amounts of data, and Hadoop is an Apache open-source framework for running MapReduce programs.

    In this session we will look at why processing very large datasets is difficult with current tools and how MapReduce and Hadoop help. The focus of the session is to understand the constraints that the MapReduce programming model impose on writing parallel programs, and how those same constraints actually provide a useful way to look at many data processing problems. To develop this understanding a few basic MapReduce worked examples will be given and demonstrated on a running Hadoop system, then the group will be invited to work in pairs to write a MapReduce program to solve a data processing problem.

    Coverage write-up

Schedule incomplete?

Add a new session

Filter by Day

Filter by coverage

Filter by Topic