Tuesday, October 20, 2009

Review of Map-Reduce

This writeup gives an overview of the Map-Reduce pattern and its various uses. It gives a brief overview of the two phases involved in Map-Reduce - The Computation and the Reduction phase.
The issues occurring while distributing the smaller computations to the worker nodes as well as the inherent problems involved in reduction are explained clearly. The MapReduce concept is easy to understand from the logical viewpoint of pairs, where the input is taken in the form of data in a domain and the output is in a different domain. Other concepts of MapReduce such as the distribution of tasks and reliability of results are equally important, and even though they are not mentioned in this writeup, they are worth mentioning. Of great importance is seeing the MapReduce implementation of the PageRank algorithm. Earlier, during the initial stages of MapReduce, there was a certain amount of backlash from the database community because it promoted concepts like Brute Force instead of indexing and does not rely on many of the DBMS tools that people have used over the years.

No comments:

Post a Comment