hadoop - Execuation time of MapReduce with Combiner -


i have mapreduce job reads file collect words 5 characters or less , start upper case letter using first letter key. ran job twice once without using combiner , second time combiner. compared execution times , noticed using combiner increased execution time. know causes increase in time , case when using combiner?

thank you

as name suggests combiners should used when there possibility combine. generally, shall applied on functions commutative(a.b = b.a) , associative {a.(b.c) = (a.b).c} . caution, there no hard , fast rule has commutative , associative. combiners may operate on subset of keys , values or may not execute @ all. if there less amount of duplicate keys in mapper output @ times using combiners may backfire , instead become useless burden. use combiners when there enough scope of combining.

quoting chuck lam's 'hadoop in action':

"a combiner doesn't improve performance. should monitor job's behavior see if number of records outputted combiner meaningfully less number of records going in. reduction must justify execution time of running combiner. "

hence, in case possible number of subsets can combined less in ratio, hence overhead of running combiner increases execution time.

read more article here.


Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -