Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Problem with file_get_contents

Status
Not open for further replies.

FNBIT

IS-IT--Management
Oct 27, 2006
74
0
0
US
Hello,

I am having a problem that is starting to drive me up a wall. I want to pull a certain site using file_get_contents. However I cannot get it to work the way I want. Here is the code:
Code:
$Domain = "somedomain.com";
$BingSearch = file_get_contents('[URL unfurl="true"]http://www.bing.com/search?q='.$Domain);[/URL]

preg_match('~<a href="[URL unfurl="true"]http://cc.(.*?)>Cached[/URL] page</a>~i', $BingSearch, $BingCache);
if (preg_match('~bing(.*?)" onmousedown~i', $BingCache[1], $BingArray))
          {$BingCacheURL = "[URL unfurl="true"]http://cc.bing".$BingArray[/URL][1];} else { $BingCacheURL = "Not Available";}

echo "The Bing Cached URL is: $BingCacheURL<br>";

$BingCachedSite = file_get_contents($BingCacheURL);
echo $BingCachedSite;

The echo of $BingCacheURL looks fine but the section that sets $BingCachedSite fails. If I copy the URL that is echo'ed from $BingCacheURL and manually set it, it all works fine. For instance if I change the above to:
Code:
$BingCacheURL =  "[URL unfurl="true"]http://cc.bingj.com/cache.aspx?q=%22somedomain+com%22&d=4649864789426651&mkt=en-US&setlang=en-US&w=d7e1d8f4,cedfa1f9";[/URL]

echo "The Bing Cached URL is: $BingCacheURL<br>";

$BingCachedSite = file_get_contents($BingCacheURL);
echo $BingCachedSite;

it works fine. Please note I replaced my domain name with "somedomain" for this example. I got the above URL from copying it from the output of the first example. It is probably something simple but I can't figure it out. I even did this to check to see if I could see the difference:
Code:
$Domain = "somedomain.com";
$BingSearch = file_get_contents('[URL unfurl="true"]http://www.bing.com/search?q='.$Domain);[/URL]

preg_match('~<a href="[URL unfurl="true"]http://cc.(.*?)>Cached[/URL] page</a>~i', $BingSearch, $BingCache);
if (preg_match('~bing(.*?)" onmousedown~i', $BingCache[1], $BingArray))
          {$BingCacheURL = "[URL unfurl="true"]http://cc.bing".$BingArray[/URL][1];} else { $BingCacheURL = "Not Available";}

$BingCacheURL2 =  "[URL unfurl="true"]http://cc.bingj.com/cache.aspx?q=%22somedomain+com%22&d=4649864789426651&mkt=en-US&setlang=en-US&w=d7e1d8f4,cedfa1f9";[/URL]

echo "The Bing Cached 1 URL is: $BingCacheURL<br>";
echo "The Bing Cached 2 URL is: $BingCacheURL2<br>";

$BingCachedSite = file_get_contents($BingCacheURL);
echo $BingCachedSite;

They look the same to me. Not sure what is going on. I would appreciate any help.

Thanks,
Chris
 
Is there a way to compare and see the differences between the two strings? I thought there may be some line breaks but I am not sure how to see them. I copied and pasted into Word and did a show all codes but I still don't see anything different between the two variables. However there must be some difference since one works and the other does not.
 
Is there a way to see hidden characters? I ran the function "similar_text" using my domain in code above. It came back and said the difference was at the very end which makes me think there is a line return or something. However I did try some of the string substitution commands that remove /n, /r, etc. with no affect. It also comes back and tells me that they are 93.700787401575% alike.

The length of the string that I created by screen scraping is 135.
The length of the other one that I copied and pasted is 119.
Therefore there are an additional 16 characters that I cannot see which I would like to remove. The trim function does not do anything either. Any PHP pros run into anything like this before?
 
I figured it out. So for those that may run into this in the future I used this hexdump function:
I found that in some places where there should be a "&" I had instead "&". I used a string replace to change them from "&" to just "&". Now the strings match in size. Not positive why this happened but I can now reference the cached page with the new value.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top