regex - Is special treatment required for <tag2> within <tag1> in Python's re.sub? -
example:
this has desired effect:
replace following blank:
<tag condition="mycondition">text</tag>
via:
string = re.sub('<tag condition=\"mycondition\">.+</tag>', '', string)
but consider following:
<tag2 condition="mycondition2"> <tag>text</tag> , <tag>text</tag> here. </tag2>
and want replace tag2
, contents blank eg:
string = re.sub('<tag2 condition=\"mycondition2\">.+</tag2>', '', string)
it not removing tag2
, contents , think might because there <tags>
within tag2
.
how replace tag2
, contents blank?
once past simple cases, regex becomes enemy. parse xml proper xml parser, modify parsed tree, , print out:
import lxml.etree xml = ''' <?xml version="1.0" encoding="utf-8" ?> <root> <tag condition="mycondition">text</tag> <tag3>don't touch me</tag3> <tag2 condition="mycondition2"> <tag>text</tag> , <tag>text</tag> here. </tag2> </root> ''' tree = lxml.etree.fromstring(xml.strip()) element in tree.xpath('//tag[@condition="mycondition"] | //tag2[@condition="mycondition2"]'): element.getparent().remove(element) print(lxml.etree.tostring(tree, pretty_print=true))
Comments
Post a Comment