regex - Single line delimited text -


i have single line text containing invoice's rows, extracted pdf.

i should parse it, getting rows output tokens.

here's snippet:

1 nr 0pr 18ov dho1o154 occhiale acetato donna vista 1 nr 0pr 18ov nag1o152 occhiale acetato donna vista 1 nr 0pr 61qv 7ax1o156 occhiale metallo uomo vista descrizione causale vendita 2 nr 0an4007 41 / 87 66 occhiale nylon uomo sole descrizione causale vendita 1 nr 0ea4001 50638g56 valeria occhiale nylon uomo sole descrizione causale vendita - pag 1 di 3 - segue - 1 nr 0po3042s 972 / m351 sofia occhiale acetato uomo sole descrizione causale vendita 1 nr 0an3048 502 / 8g30 valeria occhiale metallo uomo sole descrizione causale vendita 6 nr 0dg4204 27648764 occhiale acetato uomo sole descrizione causale vendita 1 nr 0ox3123 31230453 valeria occhiale acciaio uomo vista

i want get, token, example first:

1 nr 0pr 18ov dho1o154 occhiale acetato donna vista

explained, token should be:

  • starting integer+*space*+nr+space
  • containing whatever can find after start, strings, numbers, whatever...
  • ending before: next "x+nr" starting token, fixed strings (such "descrizione causale vendita") or end of file.

using regex (\b\d+\b nr) can match x+nr starting tokens, how can select next part, before next x+nr token?

notice title! have in 1 single line, so... no new lines separators!

thank you

building on regex have far, can use positive lookahead:

(?:\b\d+\b nr).*?(?=\b\d+\b nr|$) 

regex101 demo

each colour indicates different match.

(?= ... ) positive lookahead doesn't count match. therefore, matches until , before next \b\d\b nr or end of string $.


Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -