Back to examples |
This script is a close relative of the Google SERP script.
The only slight difference is the enclosing HTML tag we are using to find the
site in the results page, and the fact we are querying 100 hits at a time, as
Yahoo does not "nest" results from the same site as Google does.
We still leave the code in to check for "nested" results, even though
it isn't really going to be used.
Again, remember that Yahoo can and may change the search URL which will make
this script invalid, you will need to keep checking these things!
|
<?php // $searchquery is the value to search for. // The script replaces the spaces and ampersands and // converts them to values that Yahoo is expecting.
// $searchurl is the url to find - ie www.web-max.ca // Do not pass http:// - you don't need it.
if(!empty($searchquery) && !empty($searchurl)) { $query = str_replace(" ","+",$searchquery); $query = str_replace("%26","&",$query);
// How many results to search through.
$total_to_search = 500;
// The number of hits per page.
$hits_per_page = 100; // Obviously, the total pages / queries we will be doing is // $total_to_search / $hits_per_page // This will be our rank $position = 0;
// This is the rank minus the duplicates $real_position = 0;
$found = NULL; $lastURL = NULL;
for($i=0;$i<$total_to_search && empty($found);$i+=$hits_per_page) { // With Yahoo to get to the next page we pass the starting record, // ie page 1 starts at 1, page 2 at 101;
$page_var=$i+1;
// Open the search page. // We are filling in certain variables - // $query,$hits_per_page and $start.
$filename = "http://search.yahoo.com/search?_adv_prop=web&x=op&ei=UTF-8". "&prev_vm=p&va=$query&va_vt=any&vp=&vp_vt=any&vo=&vo_vt=any". "&ve=&ve_vt=any&vd=all&vst=0&vs=&vf=all&vm=p". "&vc=&fl=0&n={$hits_per_page}&b=$page_var";
$file = fopen($filename, "r"); if (!$file) { echo "<p>Unable to open remote file $filename.\n"; } else {
// Now load the file into a variable line at a time
while (!feof($file)) { $var = fgets($file, 1024);
// Yahoo uses an EM tag with a class of yschurl to show the site
if(eregi("<em class=yschurl>(.*)</em>",$var,$out)) {
// If we find it take out any <B> </B> tags - google does // highlight search terms within URLS
$out[1] = strtolower(strip_tags($out[1]));
// Get the domain name by looking for the first /
$x = strpos($out[1],"/");
// and get the URL
$url = substr($out[1],0,$x);
$position++;
// If you want to see the hits, set $trace to something
if($trace)print($url."<br>");
// If the last result process is the same as this one, it // is a nest or internal domain result, so don't count it // on $real_position
if(strcmp($lastURL,$url)<>0)$real_position++;
$lastURL = $url;
// Else if the sites match we have found it!!!
if(strcmp($searchurl,$url)==0) { $found = $position; // We quit out, we don't need to go any further. break; } } } } fclose($file); }
if($found) { $result = "The site $searchurl is at position $found ". "( $real_position ) for the term <b>$searchquery</b>"; } else { $result = "The site $searchurl is not in the top $total_to_search ". "for the term <b>$searchquery</b>"; } } ?>
|
|
|
misc_13.zip
|