java - Most performing way to extract links to embedded resources in JSOUP -


i have question regarding best way regarding best way parse document extract resource links.

you show in cookbook:

the following method builds dom representation:

    document doc = jsoup.connect(url).get();     elements links = doc.select("a[href]"); 

so not sure it's performing way.

  • is better iterate on doc.getallelements() ?
  • or there kind of sax parser equivalent ?

i asked question on jsoup googlegroups on 3 march, not sure mail has passed filter.

the performing way use nodetraversor implementation of visitor pattern. scan whole tree 2 other options,

  • it won't require parse css query , match 'dynamically' - it's less possible jit-optimize custom query static filter
  • it won't store elements list getallelements()

sax parsing model not supported because jsoup creates dom-tree, there no need in less-capable sax. , htmlcleaner.

final list<element> elements = new arraylist<element>();  new nodetraversor(new nodevisitor() {     public void head(node node, int depth) {         if (node instanceof element) {             element element = (element) node;             if(element.tagname().equalsignorecase("a") && element.hasattr("href")){                 elements.add(element);             }         }     }      public void tail(node node, int depth) {     } }).traverse(doc);  return elements; 

Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -