Job scheduling in Hadoop for Heterogeneous cluster

This is for Hadoop eco system like HDFS, Map reduce, Hive, Hbase, Pig, sqoop,sqoop2, Avro, solr, hcatalog, impala, Oozie, Zoo Keeper and Hadoop distribution like Cloudera, Hortonwork etc.
mina1234
Posts: 72
Joined: Tue May 05, 2015 7:08 pm
Contact:

Job scheduling in Hadoop for Heterogeneous cluster

Postby mina1234 » Wed Sep 30, 2015 11:10 pm

Hello,

I am a Post Graduate student and i am doing my research on job scheduling in Hadoop for heterogeneous clusters. I have been trying on the solution to predict a jobs execution time while its still running in the queue or has been just submitted to the job queue.
The various possible ideas that i have found by studying Hadoop Yarn to find the estimated run time of a MapReduce job are

1) The MapReduce jobs estimated run time is directly proportional to the number of Map Tasks it executes and the amount of memory/vcores the job requires to execute. But will it be appropriate for Heterogeneous cluster?

2) The TaskRuntimeEstimator interface in org.apache.hadoop.mapreduce.v2.app.speculate and its various methods such as estimatedRuntime and thresholdRuntime. I am still not clear with the code of these interface and its methods?

I would be thankful for any help and/or suggestions


snehalshah
Posts: 61
Joined: Sun Aug 30, 2015 8:02 am
Contact:

Re: Job scheduling in Hadoop for Heterogeneous cluster

Postby snehalshah » Wed Sep 30, 2015 11:36 pm

Use oozie workflow. It provides you way to get callbacks when the workflow enters / exits your job. You can have a start-stop timer to calculate the run time.


Return to “Hadoop and Big Data”

Who is online

Users browsing this forum: No registered users and 1 guest