python - BeautifulSoup4 - All links within 1 div on multiple pages -

for schoolproject need scrape 'job-finding' website , store in db, , later match these profiles companies searching people.

on particular site, url's pages need scrape in 1 div (with 10 links per page) div called 'primaryresults' has 10 in it.

with beautifulsoup wish first scrape links in array looping through page number in url until 404 or similar pops up.

then go through each of these pages, , store information need each page array , lastly send db.

now i'm getting stuck @ part collect 10 links id = 'primaryresults' div.

how go , put python make store 10 url's array? far have tried this:

import urllib2 beautifulsoup import beautifulsoup  opener = urllib2.build_opener() opener.addheaders = [("user-agent", "mozilla/5.0")]  url = ("http://jobsearch.monsterboard.nl/browse/")  content = opener.open(url).read() soup = beautifulsoup(content)  soup.find(id="primaryresults") print soup.find_all('a')

but gives error:

traceback (most recent call last):  print soup.find_all('a') typeerror: 'nonetype' object not callable

could please me out? :)

here answer links in url have mentioned

from bs4 import beautifulsoup import urllib2 url="http://jobsearch.monsterboard.nl/browse/" page=urllib2.urlopen(url) soup = beautifulsoup(page.read()) jobs=soup.findall('a',{'class':'sljobtitle'}) eachjob in jobs:  print eachjob['href']

hope clear , helpful.

Search This Blog

Bradly

python - BeautifulSoup4 - All links within 1 div on multiple pages -

Comments

Post a Comment

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

What is the end of string notation in python -

php - Add the correct number of days for each month -