hadoop - Datatype mismatch causing comparison failure? Python UDF in Pig -
i'm having trouble python udf use in pig scripts. believe problem assumed input deltas in format it's not in, i'm not sure how fix (python n00b).
note: on cloudera (cdh4.3) distro of hadoop v.2.0.0, pig v.0.11.0, python 2.4.3.
import org.apache.pig.impl.logicallayer.schema.schemautil schemautil @outputschema("adj:float") def cumrelfreqadj(deltas): # create bins of increment 0.01 = [i*-0.01 in range(100)] = a[1:len(a)] b = [i*0.01 in range(101)] a.extend(b) a.sort() bins = # build cumulative relative frequency distribution cumfreq = [0]*200 delta in deltas: bin in range(len(bins)): if delta <= bins[bin]: cumfreq[bin] += 1 cumrelfreq = [float(cumfreq[i]) / max(cumfreq) in range(len(cumfreq))] crf = zip(bins, cumrelfreq) relfreq in crf[:]: if relfreq[1] > 0.11: # 10%ile adj = relfreq[0] + 0.05 break return adj
do need convert input list first?
answered own question. input pig bag of tuples. in case each tuple has 1 element, e.g.: {(-0.01), (-0.03), (0.00001), (-0.2383), (0.158)}.
so in order compare float-type element list bins, need insert like:
delta = list(delta)[0]
between lines 16 & 17 above, pull out float-typed data element content of tuple. comparison on line 18 work.
Comments
Post a Comment