Building a page parser with php. want to use some jquery/ajax -
allright guys! have been searching, , have troubles finding solution problem. , in advance, sorry bad english.
im building small parser news articles 1 specific news site. , want code prepared add other news pages well, thats why is.
i want page reload content without refreshing page. , know takes while retrieve content selected url. thats why want add progressbar jqueryui (i know allot ask for). progressbar optional.
and im using simple html dom parser
<?php //page load time $starttime = explode(' ', microtime()); $starttime = $starttime[1] + $starttime[0]; ?> <!doctype html public "-//w3c//dtd html 4.01 transitional//en" "http://www.w3.org/tr/html4/loose.dtd"> <html> <head> <meta http-equiv="content-type" content="text/html;charset=utf-8"/> <title>svd parser</title> <link rel="shortcut icon" href="favicon.ico" type="image/x-icon"/> <link rel="stylesheet" type="text/css" href="style.css"/> <script type="text/javascript" src="jquery-1.10.2.min.js"></script> </head> <body> <div class="container"> <div id="head"> <h1>svd parser</h1> <hr> <form action="index.php" method="post"> <input type="text" name="s" placeholder="enter url start svd parser" style="width: 495px;"> <input type="submit" value="svd parser it"> </form> <?php if (isset($_post["s"]) && trim($_post["s"]) !="") { //what domain? preg_match('@^(?:http://)?([^/]+)@i',$_post["s"], $matches); $host = $matches[1]; // last 2 segments of host name preg_match('/[^.]+\.[^.]+$/', $host, $matches); echo "<b>domain name is: {$matches[0]}.</b><br>\n"; function checkdomaingetrightvalues($domain) { if ($domain == "svd.se") { $h1="h1"; $page="p[class=preamble], div[class=articletext]"; return array('h1'=> $h1,'searchparse' => $page); }else { return null; } } include('simple_html_dom.php'); $html = new simple_html_dom(); $ids=checkdomaingetrightvalues($matches[0]); //get page $html = file_get_html($_post['s']); // find h1 $ret = $html->find($ids['h1']); //strip h1 of html tags (a href) add h1 tags echo "<h1>" . strip_tags($ret[0]) . "</h1>"; //find actual article , forget else //function extraction right parse lines //$values= checkdomaingetrightvalues($matches[0]); $ret = $html->find($ids['searchparse']); //prints article out html tags, <p> can read //print first part of article hint echo "<p><b>". strip_tags($ret[0]) ."</b></p>"; //here actuall article $a=html_entity_decode($ret[1]); echo strip_tags($a, '<p>'); $html->clear(); unset($html); }else{ echo "you need write whole article url<br>"; } //page load time $mtime = explode(' ', microtime()); $totaltime = $mtime[0] + $mtime[1] - $starttime; printf('page loaded in %.3f seconds.', $totaltime); ?> </div> <div id="sidebar"> <b>svd </div> </div> </body> </html>
i appreciate if @ least point me in right direction!
Comments
Post a Comment