python - HTML page vastly different when using a headless webkit implementation using PyQT -
i under impression using headless browser implementation of webkit using pyqt automatically me html code each url heavy js code in it. seeing partially. comparing page when save page firefox window.
i using following code -
class jabbawebkit(qwebpage): # 'html' class variable def __init__(self, url, wait, app, parent=none): super(jabbawebkit, self).__init__(parent) jabbawebkit.html = '' if wait: qtimer.singleshot(wait * sec, app.quit) else: self.loadfinished.connect(app.quit) self.mainframe().load(qurl(url)) def save(self): jabbawebkit.html = self.mainframe().tohtml() def useragentforurl(self, url): return user_agent def get_page(url, wait=none): # here trick how call several times app = qapplication.instance() # checks if qapplication exists if not app: # create qapplication if doesnt exist app = qapplication(sys.argv) # form = jabbawebkit(url, wait, app) app.abouttoquit.connect(form.save) app.exec_() return jabbawebkit.html
can 1 see wrong code?
after running code through few urls, here 1 found shows problems running quite - http://www.chilis.com/en/pages/menu.aspx
thanks pointers.
the page have ajax code, when finish load, still need time update page ajax. code quit when finish load.
you should add code wait time , process events in webkit:
for in range(200): #wait 2 seconds app.processevents() time.sleep(0.01)
Comments
Post a Comment