Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to reference Yahoo and Bing cached pages? 1

Status
Not open for further replies.

FNBIT

IS-IT--Management
Oct 27, 2006
74
US
Hello,

I am creating some code that will extract some code from cached web pages. I have the code working and I can access Google's cached pages but I am not sure how to access Bing and Yahoo.

For instance if I want the cached page from Google I would use:
How would I do this for Bing and Yahoo?
 
I am using PHP code to reference these. For example I want to pull the title section so I would use:
Code:
(preg_match('~<title>(.*?)</title>~i', file_get_contents('[URL unfurl="true"]http://www.mywebsite.com'),[/URL] $TheTitle );
echo "The Title is $TheTitle[1]);

This would return what I had in the title tag which is fine but I want to find out what was in the title tag when the system was last cached by the various search engines.

It's not directly a PHP question but I am using the answer in my PHP code and I am not sure where else to ask this question.
 
Well I think what I will do is extract the cache URL directly from the search page through PHP. As long as the page I am looking for is the first one it should work. I will search for the exact domain name which should make it always come up first. (I hope ;-) )
 
for yahoo use the exposed API rather than screen scraping.
likewise bing offers an API that is accessible from php.

try googling '[bing/yahoo] search api' for more information
 
Is there a reason I should not use screen scraping? I am not sure I want to learn the API, I am not that good at PHP. Using the API may be above my head.
 
are you sure that screen scraping these sites is legal?

but the answer generally to your question is that it is always better to use the API as that will always give you meaningful results whereas a screen scrape relies on the structure of the page not changing. since you are not in control of the structure your application can break at any time.
 
Good point. I didn't think of that. I think once it is done I will go back and change that part later. Thanks for the advice!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top