Remove XML comments using Regex in bash -
i want remove xml comments in bash using regex (awk, sed, grep...) have looked @ other questions missing something. here's xml code
<table> <!-- removed bla bla bla bla bla bl............ removeee removeddddd --> <row> <column name="example" value="1" ></column> </row> </table> so i'm comparing 2 xml files don't want comparison take account comments. this
diff file1.xml file2.xml | sed '/<!--/,/-->/d' but removes line starts <!-- , last line. not remove lines in between.
in end, you're going have recommend client/friend/instructor need install kind of xml processor. xmlstarlet command line tool, there number (or @ least number greater 2) of implementations of xslt can compiled standard unix, , in cases windows. cannot xml processing regex-based tools, , whatever hard read, harder maintain, , fail on corner cases, disastrous consequences.
i haven't spent lot of time polishing or reviewing following little awk program. think remove comments compliant xml documents. note following comment not compliant:
<!-- xml comments cannot include -- comment illegal --> and not treated correctly script.
the following illegal, since i've seen in wild , wasn't hard deal with, did so:
<!-------------- comment ill-formed but... --------------> here is. no guarantees. know it's hard read, , wouldn't want maintain it. may fail on arbitrary corner cases.
awk 'in_comment&&/-->/{sub(/([^-]|-[^-])*--+>/,"");in_comment=0} in_comment{next} {gsub(/<!--+([^-]|-[^-])*--+>/,""); in_comment=sub(/<!--+.*/,""); print}'
Comments
Post a Comment