Can i run a C++ executable from hadoop python wrapper

This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
sandip
Posts: 123
Joined: Tue Aug 26, 2014 5:11 pm
Contact:

Can i run a C++ executable from hadoop python wrapper

Postby sandip » Thu Aug 28, 2014 5:28 pm

Can i run a C++ executable from hadoop python wrapper?

I am trying to run a C++ executable (which takes a local filename as an argument and write a file in local file system) from a python code.
My python code call mapper in hadoop.
C++ code is working fine and i checked it in local file system. when I am calling it from python it is working fine in local file system.
But, Whenever, I am trying to call the python in Hadoop cluster, it is not working.

Is there any setting i missed?


Guest

Re: Can i run a C++ executable from hadoop python wrapper

Postby Guest » Tue Sep 16, 2014 10:43 pm

Assuming you can verify that your Python code is able to execute the binary locally, you should make sure to deploy the C++ binary also to the worker machines to make it available for Mappers. You can use the -file command line argument for this.

You can specify any executable as the mapper and/or the reducer. The executables do not need to pre-exist on the machines in the cluster; however, if they don't, you will need to use "-file" option to tell the framework to pack your executable files as a part of job submission.

In addition to executable files, you can also package other auxiliary files (such as dictionaries, configuration files, etc) that may be used by the mapper and/or the reducer.

For example:

$HADOOP_HOME/bin/hadoop jar $HADOOP_HOME/hadoop-streaming.jar \
-input myInputDirs \
-output myOutputDir \
-mapper myPythonScript.py \
-reducer /bin/wc \
-file myPythonScript.py
-file myfile.txt


hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.1.2.jar \
-input /ngrams \
-output /output-streaming \
-mapper mapper.py \
-combiner reducer.py \
-reducer reducer.py \
-jobconf stream.num.map.output.key.fields=3 \
-jobconf stream.num.reduce.output.key.fields=3 \
-jobconf mapred.reduce.tasks=10 \
-file mapper.py \
-file reducer.py


see details:
http://blog.cloudera.com/blog/2013/01/a ... or-hadoop/


Return to “Hadoop and Big Data”

Who is online

Users browsing this forum: No registered users and 1 guest