python - BeautifulSoup4 - All links within 1 div on multiple pages -


for schoolproject need scrape 'job-finding' website , store in db, , later match these profiles companies searching people.

on particular site, url's pages need scrape in 1 div (with 10 links per page) div called 'primaryresults' has 10 in it.

with beautifulsoup wish first scrape links in array looping through page number in url until 404 or similar pops up.

then go through each of these pages, , store information need each page array , lastly send db.

now i'm getting stuck @ part collect 10 links id = 'primaryresults' div.

how go , put python make store 10 url's array? far have tried this:

import urllib2 beautifulsoup import beautifulsoup  opener = urllib2.build_opener() opener.addheaders = [("user-agent", "mozilla/5.0")]  url = ("http://jobsearch.monsterboard.nl/browse/")  content = opener.open(url).read() soup = beautifulsoup(content)  soup.find(id="primaryresults") print soup.find_all('a') 

but gives error:

traceback (most recent call last):  print soup.find_all('a') typeerror: 'nonetype' object not callable 

could please me out? :)

here answer links in url have mentioned

from bs4 import beautifulsoup import urllib2 url="http://jobsearch.monsterboard.nl/browse/" page=urllib2.urlopen(url) soup = beautifulsoup(page.read()) jobs=soup.findall('a',{'class':'sljobtitle'}) eachjob in jobs:  print eachjob['href']   

hope clear , helpful.


Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -