regex - Is special treatment required for <tag2> within <tag1> in Python's re.sub? -


example:

this has desired effect:

replace following blank:

<tag condition="mycondition">text</tag> 

via:

string = re.sub('<tag condition=\"mycondition\">.+</tag>', '', string) 

but consider following:

<tag2 condition="mycondition2"> <tag>text</tag> , <tag>text</tag> here. </tag2> 

and want replace tag2 , contents blank eg:

string = re.sub('<tag2 condition=\"mycondition2\">.+</tag2>', '', string) 

it not removing tag2 , contents , think might because there <tags> within tag2.

how replace tag2 , contents blank?

once past simple cases, regex becomes enemy. parse xml proper xml parser, modify parsed tree, , print out:

import lxml.etree  xml = '''     <?xml version="1.0" encoding="utf-8" ?>     <root>         <tag condition="mycondition">text</tag>          <tag3>don't touch me</tag3>          <tag2 condition="mycondition2">             <tag>text</tag> , <tag>text</tag> here.         </tag2>     </root> '''  tree = lxml.etree.fromstring(xml.strip())  element in tree.xpath('//tag[@condition="mycondition"] | //tag2[@condition="mycondition2"]'):     element.getparent().remove(element)  print(lxml.etree.tostring(tree, pretty_print=true)) 

Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -