PHP HTML DOM, XPATH - weird characters? -
assume $html_dom contains page has html entities  . in output below, output  .
$html_dom = new domdocument(); @$html_dom->loadhtml($html_doc); $xpath = new domxpath($html_dom); $query = '//div[@class="foo"]/div/p'; $my_foos = $xpath->query($query_abstract); foreach ($my_foos $my_foo) { echo html_entity_decode($my_foos->nodevalue); die; } how handle don't weird characters? tried following no success:
$html_doc = mb_convert_encoding($html_doc, 'html-entities', 'utf-8'); $html_dom = new domdocument(); $html_dom->resolveexternals = true; @$html_dom->loadhtml($html_doc); $xpath = new domxpath($html_dom); $query = '//div[@class="foo"]/div/p'; $my_foos = $xpath->query($query); foreach ($my_foos $my_foo) { echo html_entity_decode($my_foos->nodevalue); die; }
mb_convert_encoding idea, not work expected because domdocument seems little big buggy when comes encoding.
moving mb_convert_encoding actual node output did trick.
$html_dom = new domdocument(); $html_dom->resolveexternals = true; @$html_dom->loadhtml($html_doc); $xpath = new domxpath($html_dom); $query = '//div[@class="foo"]/div/p'; $my_foos = $xpath->query($query); foreach ($my_foos $my_foo) { echo mb_convert_encoding($my_foo->nodevalue, 'html-entities', 'utf-8'); die; }
Comments
Post a Comment