javascript - How best to handle relative urls in scraped content? -

javascript - How best to handle relative urls in scraped content? -

what methods make relative urls absolute in scraped content scraped html appears original , css not broken?

i found out <base> tag may help. how can find out original base of url is?

i don't care interactions links, want them appear correct.

assume site 'example.com/blog/new/i.html' scrape has 2 resources

< link src="/style/style.css" >
< link src="newstyle.css" >.

now if set base 'example.com/blog/new/i.html' wont first 1 break

keep track of url of each page scraped. 1 way save full url filename. then, can resolve relative urls per html spec.

Comments