map side join and reduce side join in Hadoop

This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
Posts: 125
Joined: Wed Aug 27, 2014 1:10 am

map side join and reduce side join in Hadoop

Postby dharama123 » Thu Sep 18, 2014 3:16 am

What is map side join and reduce side join in Hadoop?


Re: map side join and reduce side join in Hadoop

Postby Guest » Sat Sep 20, 2014 7:31 pm

Joins is one of the interesting features available in MapReduce.
Map side Join
1) Joins performed by Mapper are called as Map-side Joins.
2) We can achieve following kind of joins using Map-Side techniques,
a) Inner Join
b) Outer Join
c) Override – MultiFilter for a given key, prefered values from the right most source
3) Within local machine and fast
4) Data should be partitioned and sorted in particular way.
5) Each input data should be divided in same number of partition.[url][/url]

Reduce side Join
1) oins performed by Reducer can be treated as Reduce-side joins. Frameworks like Pig, Hive, or Cascading has support for performing joins.
2) Reduce-Side joins are more simple than Map-Side joins since the input datasets need not to be structured.
3) it is less efficient as both datasets have to go through the MapReduce shuffle phase. the records with the same key are brought together in the reducer. We can also use the Secondary Sort technique to control the order of the records.
4) will happen between more than one machine.

Return to “Hadoop and Big Data”

Who is online

Users browsing this forum: No registered users and 1 guest