java - Most performing way to extract links to embedded resources in JSOUP -

i have question regarding best way regarding best way parse document extract resource links.

you show in cookbook:

http://jsoup.org/cookbook/extracting-data/example-list-links

the following method builds dom representation:

    document doc = jsoup.connect(url).get();     elements links = doc.select("a[href]");

so not sure it's performing way.

is better iterate on doc.getallelements() ?
or there kind of sax parser equivalent ?

i asked question on jsoup googlegroups on 3 march, not sure mail has passed filter.

the performing way use nodetraversor implementation of visitor pattern. scan whole tree 2 other options,

it won't require parse css query , match 'dynamically' - it's less possible jit-optimize custom query static filter
it won't store elements list getallelements()

sax parsing model not supported because jsoup creates dom-tree, there no need in less-capable sax. , htmlcleaner.

final list<element> elements = new arraylist<element>();  new nodetraversor(new nodevisitor() {     public void head(node node, int depth) {         if (node instanceof element) {             element element = (element) node;             if(element.tagname().equalsignorecase("a") && element.hasattr("href")){                 elements.add(element);             }         }     }      public void tail(node node, int depth) {     } }).traverse(doc);  return elements;

Search This Blog

Bradly

java - Most performing way to extract links to embedded resources in JSOUP -

Comments

Post a Comment

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

What is the end of string notation in python -

php - Add the correct number of days for each month -