With today's applications we are generating data faster than we can understand it. So having tools for processing, aggregating, and analyzing large volumes of data is vital for us to reach that level of understanding. MapReduce and Hadoop are two new tools for this purpose. MapReduce is a parallel programming model devised at Google for efficiently processing large amounts of data, and Hadoop is an Apache open-source framework for running MapReduce programs.
In this session we will look at why processing very large datasets is difficult with current tools and how MapReduce and Hadoop help. The focus of the session is to understand the constraints that the MapReduce programming model impose on writing parallel programs, and how those same constraints actually provide a useful way to look at many data processing problems. To develop this understanding a few basic MapReduce worked examples will be given and demonstrated on a running Hadoop system, then the group will be invited to work in pairs to write a MapReduce program to solve a data processing problem.
Sign in to add slides, notes or videos to this session