python - SQL query for merging timestamps -
i have data in postgres table in following form:
col1 col2 col3 col4 id1 b c id2 id1 timebegin 1###-##-## id2 id1 timeend 22##-##-## id3 id4 id5 id6 id6 id3 timebegin 2##-##-## id7 id3 timeend 200-3-## id13 id8 id14 id15 id8 id9 timebegin -2-1-1 id10 id11 id12 id13
here 1###-##-## imply uncertainty in time (1000-01-01 1999-12-31)
and 22##-##-## imply uncertainty in time (2200-01-01 2200-12-31)
and 2##-##-## imply uncertainty in time (200-01-01 200-12-31)
and 200-3-## imply uncertainity in time (200-3-01 200-3-31)
and 20-3-## imply uncertainty in time (20-3-01 20-3-31)
and 200-3-## imply uncertainty in time (200-3-01 200-3-31)
and -200-3-## imply uncertainty in time (-200-3-31 -200-3-01)
now want merge 3 rows col1==col2 1 of following form:
col1 col2 col3 col4 timebegin timeend id1 b c 1000-01-01 2200-12-31 id3 id4 id5 id6 200-01-02 200-3-31 id10 id11 id12 id13 null null id13 id8 id14 id15 2-1-1 bc 9999-12-12
if timeend col1==col2 not given 9999-12-12 presumed timeend
if timebegin col1==col2 not given 01-01-01 presumed timebegin
i.e. want take minimum of timebegin , maximum of timeend while merging.
is possible join operation in postgres. i.e. can write sql join query?
if can achieve desired using programming language python (in efficient manner) also: great.
i'm not convinced best way, here pair of postgres functions difficult bit of converting patterns min , max dates:
create function preprocesspattern(pat varchar(11), out cpat varchar(10), out neg boolean) $$ declare y varchar(4); m varchar(2); d varchar(2); int; begin neg = false; if left(pat, 1) = '-' neg = true; pat = right(pat, -1); end if; = position('-' in pat); y = right('000' || left(pat, - 1), 4); pat = right(pat, -i); = position('-' in pat); m = right('0' || left(pat, - 1), 2); pat = right(pat, -i); d = right('0' || pat, 2); cpat = y || '-' || m || '-' || d; end; $$ language plpgsql; create function datefromfmt(fmt varchar(10), neg boolean) returns date $$ begin if neg return to_date(fmt || ' bc', 'yyyy-mm-dd bc'); else return to_date(fmt, 'yyyy-mm-dd'); end if; end; $$ language plpgsql; create function mindatefrompattern(pat varchar(11)) returns date $$ declare int; neg boolean; n varchar(10); begin select * pat, neg preprocesspattern(pat); = position('#' in pat); if = 0 return datefromfmt(pat, neg); else n = left(pat, - 1) || right('0000-00-00', 0 - position('#' in pat) + 1); n = replace(n, '-00', '-01'); return datefromfmt(n, neg); end if; end; $$ language plpgsql; create function maxdatefrompattern(pat varchar(11)) returns date $$ declare int; y int; m int; d int; x varchar(10); neg boolean; res date; begin select * pat, neg preprocesspattern(pat); = position('#' in pat); if = 0 return datefromfmt(pat, neg); elsif = 1 return date '9999-12-31'; -- here down, pick next highest mask, convert min date subtract 1 day elsif <= 6 -- add 1 year if = 6 = 5; end if; -- skip - char x = cast(cast(left(pat, - 1) int) + 1 varchar) || right(pat, 0 - + 1); else y = cast(left(pat, 4) int); if = 7 m = cast(substr(pat, 6, 1) int) + 1; if m = 2 m = 0; y = y + 1; end if; x = left(to_char(y, 'fm0000'), 4) || '-' || to_char(m, 'fm0') || '#-##'; elsif = 9 m = cast(substr(pat, 6, 2) int) + 1; if m > 12 m = 1; y = y + 1; end if; x = left(to_char(y, 'fm0000'), 4) || '-' || to_char(m, 'fm00') || '-##'; elseif = 10 m = cast(substr(pat, 6, 2) int); d = cast(substr(pat, 9, 1) int) + 1; if (m = 2 , d = 3) or d = 4 m = m + 1; d = 0; if m > 12 m = 1; y = y + 1; end if; end if; x = left(to_char(y, 'fm0000'), 4) || '-' || to_char(m, 'fm00') || '-' || to_char(d, 'fm0') || '#'; end if; end if; -- original logic looks little silly we're preprocessing twice res = mindatefrompattern(x) - interval '1 day'; if neg return datefromfmt(to_char(res, 'yyyy-mm-dd'), neg); else return res; end if; end; $$ language plpgsql;
this has gone through enough iterations use refactoring
Comments
Post a Comment