javascript - How best to handle relative urls in scraped content? -
what methods make relative urls absolute in scraped content scraped html appears original , css not broken?
i found out <base> tag may help. how can find out original base of url is?
i don't care interactions links, want them appear correct.
assume site 'example.com/blog/new/i.html' scrape has 2 resources
- < link src="/style/style.css" >
- < link src="newstyle.css" >.
now if set base 'example.com/blog/new/i.html' wont first 1 break
keep track of url of each page scraped. 1 way save full url filename. then, can resolve relative urls per html spec.
Comments
Post a Comment