Explain MapReduce and MapReduce Phase or stage.

This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
Posts: 14
Joined: Sat Jul 19, 2014 6:44 pm

Explain MapReduce and MapReduce Phase or stage.

Postby pintuvirani » Mon Jul 21, 2014 8:25 pm

How MapReduce work? Distinct stage and Phase of the MapReduce job? What is use of MapReduce Phase in MapReduce Program?


Re: Explain MapReduce and MapReduce Phase or stage.

Postby Guest » Tue Jul 22, 2014 10:59 pm

A MapReduce program is composed of a Map() procedure that performs filtering and sorting (such as sorting students by first name into queues, one queue for each name) and a Reduce() procedure that performs a summary operation (such as counting the number of students in each queue, yielding name frequencies).

Map -> Combiner -> Shuffle & Short -> Partitioner -> Reducer

A map-reduce program typically acts something like this:
1) Input data, such as a long text file, is split into key-value pairs. These key-value pairs are then fed to your mapper. (This is the job of the map-reduce framework.)
2) Your mapper processes each key-value pair individually and outputs one or more intermediate key-value pairs.
Note: You can create custom combiner and partioner here.
3) All intermediate key-value pairs are collected, sorted, and grouped by key (again, the responsibility of the framework). it called Shuffle & Short.
4) For each unique key, your reducer receives the key with a list of all the values associated with it. The reducer aggregates these values in some way (adding them up, taking averages, finding the maximum, etc.) and outputs one or more output key-value pairs.
5) Output pairs are collected and stored in an output file (by the framework). it will create part-0 to part-n file in output folder

Return to “Hadoop and Big Data”

Who is online

Users browsing this forum: No registered users and 2 guests