regex - Bash Script parse files for multiple occurrence of string between pattern -

i doing little text processing find video content in html files uploaded users. have defined tag called "video" , users supposed put video files like

<video> abcd.mp4 </video>

presently using awk extract line has video tag,

str=$(awk '/<video>/{flag=1;} /<\/video>/{print ;flag=0} flag { print }' file.html)

the output contains tag too, prefix , suffix removal video file name. done this,

prefix="<video>" suffix="</video>"               foo=${str#$prefix} foo=${foo%$suffix}

but work files have video tags used once. files multiple usage of tags string returned awk starts first occurence of <video> till last occurence of </video>.

so question how should write script @ end of give me array of strings between <video> , </video> tag. how can change

<video> abcd.mp4 </video>

<media> abcd.mp4 </media>.

to each tag itself:

grep -eo "<video>(.+?)</video>" myfile.html

to text within tags:

grep -eo "<video>(.+?)</video>" myfile.html | sed -e "s|</?video>||g"

if opening , closing tags on different lines:

tr "\n" " " < myfile.html | grep -eo "<video>(.+?)</video>" | sed -e "s|</?video>||g"

example input:

this <video> video1.mp4 </video>  file <other> <random> </tags> <media> media1.mp4 </media>  <video> video2.mp4 </video>  <media>     media 2 spaces  , on  multiple lines.mp4 </media>

example output:

video1.mp4  video2.mp4

to both video , media tags (please specify in original question):

tr "\n" " " < vid.html | grep -eo "<(video|media)>(.+?)</(video|media)>"  | sed -e "s#</?(video|media)>##g"

output:

 video1.mp4   media1.mp4   video2.mp4   media 2 spaces      , on      multiple lines.mp4

for second question, run whole file through command:

sed -e "s|(</?)video>|\1media>|g" vid.html

Search This Blog

Bradly

regex - Bash Script parse files for multiple occurrence of string between pattern -

Comments

Post a Comment

Popular posts from this blog

iphone - Three second countdown in cocos2d -

hyperlink - how to do url routing in php -

c - Avoiding Extra Malloc in Linked List (node->next = NULL) -