javascript - How best to handle relative urls in scraped content? -


what methods make relative urls absolute in scraped content scraped html appears original , css not broken?

i found out <base> tag may help. how can find out original base of url is?

i don't care interactions links, want them appear correct.

assume site 'example.com/blog/new/i.html' scrape has 2 resources

  1. < link src="/style/style.css" >
  2. < link src="newstyle.css" >.

now if set base 'example.com/blog/new/i.html' wont first 1 break

keep track of url of each page scraped. 1 way save full url filename. then, can resolve relative urls per html spec.


Comments

Popular posts from this blog

c++ - CryptStringToBinary API behavior -

c++ - Correct method for redrawing a layered window -

java.util.scanner - How to read and add only numbers to array from a text file -