hadoop - Execuation time of MapReduce with Combiner -
i have mapreduce job reads file collect words 5 characters or less , start upper case letter using first letter key. ran job twice once without using combiner , second time combiner. compared execution times , noticed using combiner increased execution time. know causes increase in time , case when using combiner?
thank you
as name suggests combiners should used when there possibility combine. generally, shall applied on functions commutative(a.b = b.a) , associative {a.(b.c) = (a.b).c} . caution, there no hard , fast rule has commutative , associative. combiners may operate on subset of keys , values or may not execute @ all. if there less amount of duplicate keys in mapper output @ times using combiners may backfire , instead become useless burden. use combiners when there enough scope of combining.
quoting chuck lam's 'hadoop in action':
"a combiner doesn't improve performance. should monitor job's behavior see if number of records outputted combiner meaningfully less number of records going in. reduction must justify execution time of running combiner. "
hence, in case possible number of subsets can combined less in ratio, hence overhead of running combiner increases execution time.
read more article here.
Comments
Post a Comment