python - How to resample a time series producing its geometric mean? -
i'm new python , came cross tricky question when using pandas resample data.
when want resample time series data, straightforward apply arithmetic mean function.
for example:
suppose ts time series data in minute frequency (in pandas, encapsulated in pandas.series object datetimeindex).
to arithmetic mean on each group of 5 minute periods, simply:
ts.resample('5min', how='mean') however, how compute geometric mean in way? there simple solution above, example:
ts.resample('5min', how='gmean')
you can pass callable object (in case function) how, long returns scalar:
in [31]: scipy.stats.mstats import gmean in [32]: import pandas.util.testing tm in [33]: ts = tm.maketimeseries()[:10] in [34]: ts out[34]: 2000-01-03 0.605 2000-01-04 -0.167 2000-01-05 0.365 2000-01-06 -0.206 2000-01-07 -1.156 2000-01-10 -0.219 2000-01-11 1.704 2000-01-12 -0.148 2000-01-13 1.169 2000-01-14 0.823 freq: b, dtype: float64 in [35]: ts.resample('2d', how=lambda x: gmean(x).item()) out[35]: 2000-01-03 0.605 2000-01-05 0.365 2000-01-07 0.000 2000-01-09 0.000 2000-01-11 1.704 2000-01-13 0.981 dtype: float64 note have call item method here scalar result (because depending on values may maskedconstant). pandas doesn't consider single element series scalar.
also, careful results of calculations containing nans or values computing geometric mean return complex value (e.g., 4th root of negative number; return nan in numpy).
gmean turn computations 0 when call item method.
for example, why there zeros @ 2000-01-07 , 2000-01-09.
at 2000-01-07 pandas fills in nan 2nd day (remember we're doing 2d here) geometric mean computed ma.exp(ma.mean(ma.log([-1.156, nan]))). 2 values not "valid" input ma.log (thus masked) ma.mean() returns maskedconstant _data attribute 0, item method returns 0.
Comments
Post a Comment