hadoop - Datatype mismatch causing comparison failure? Python UDF in Pig -

i'm having trouble python udf use in pig scripts. believe problem assumed input deltas in format it's not in, i'm not sure how fix (python n00b).

note: on cloudera (cdh4.3) distro of hadoop v.2.0.0, pig v.0.11.0, python 2.4.3.

import org.apache.pig.impl.logicallayer.schema.schemautil schemautil  @outputschema("adj:float") def cumrelfreqadj(deltas):      # create bins of increment 0.01     = [i*-0.01 in range(100)]     = a[1:len(a)]     b = [i*0.01 in range(101)]     a.extend(b)     a.sort()     bins =      # build cumulative relative frequency distribution     cumfreq = [0]*200     delta in deltas:         bin in range(len(bins)):             if delta <= bins[bin]:                 cumfreq[bin] += 1      cumrelfreq = [float(cumfreq[i]) / max(cumfreq) in range(len(cumfreq))]      crf = zip(bins, cumrelfreq)      relfreq in crf[:]:         if relfreq[1] > 0.11:    # 10%ile             adj = relfreq[0] + 0.05             break      return adj

do need convert input list first?

answered own question. input pig bag of tuples. in case each tuple has 1 element, e.g.: {(-0.01), (-0.03), (0.00001), (-0.2383), (0.158)}.

so in order compare float-type element list bins, need insert like:

delta = list(delta)[0]

between lines 16 & 17 above, pull out float-typed data element content of tuple. comparison on line 18 work.

Search This Blog

Bradly

hadoop - Datatype mismatch causing comparison failure? Python UDF in Pig -

Comments

Post a Comment

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

What is the end of string notation in python -

php - Add the correct number of days for each month -