java - Most performing way to extract links to embedded resources in JSOUP -
i have question regarding best way regarding best way parse document extract resource links.
you show in cookbook:
the following method builds dom representation:
document doc = jsoup.connect(url).get(); elements links = doc.select("a[href]");
so not sure it's performing way.
- is better iterate on doc.getallelements() ?
- or there kind of sax parser equivalent ?
i asked question on jsoup googlegroups on 3 march, not sure mail has passed filter.
the performing way use nodetraversor
implementation of visitor pattern. scan whole tree 2 other options,
- it won't require parse css query , match 'dynamically' - it's less possible jit-optimize custom query static filter
- it won't store elements list
getallelements()
sax parsing model not supported because jsoup creates dom-tree, there no need in less-capable sax. , htmlcleaner
.
final list<element> elements = new arraylist<element>(); new nodetraversor(new nodevisitor() { public void head(node node, int depth) { if (node instanceof element) { element element = (element) node; if(element.tagname().equalsignorecase("a") && element.hasattr("href")){ elements.add(element); } } } public void tail(node node, int depth) { } }).traverse(doc); return elements;
Comments
Post a Comment