What is Hadoop Streaming?

This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
Posts: 125
Joined: Wed Aug 27, 2014 1:10 am

What is Hadoop Streaming?

Postby dharama123 » Thu Sep 18, 2014 2:35 am

What is Hadoop Streaming? How Hadoop Streaming works?


Re: What is Hadoop Streaming?

Postby Guest » Sat Sep 20, 2014 10:13 pm

Hadoop Streaming is a generic API which allows writing Mappers and Reduces in any language. But the basic concept remains the same. Mappers and Reducers receive their input and output on stdin and stdout as (key, value) pairs.

Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. For example:

Code: Select all

$HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
    -input myInputDirs \
    -output myOutputDir \
    -mapper /bin/cat \
    -reducer /bin/wc

Return to “Hadoop and Big Data”

Who is online

Users browsing this forum: No registered users and 3 guests