Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Googles IP address

Status
Not open for further replies.

Hondy

Technical User
Mar 3, 2003
864
GB
Hi

I'm seeing addresses in my server logs that whois says is google, the pages that "google" are trying to access are not on the server any more and haven't been for some time.

If it was someone using google to search for content on the site then the IP address would be of the machine that is googling, not google itself.

Why would I be seeing Googles IP in my logs trying to access pages that are no longer there? They are appearing as bad page requests in my stats software. I could block the IP but that's probably a bad thing right?

Thanks
 
Probably people surfing at google images. This causes the server to parse some weird params through to your site when the choose to view a larger image (at least I see this on my logs) and effectively requests "bad" urls.

Are you seeing lots of mention of "translate" in those bad urls as well? If so this could be the same thing I experienced. I haven't bothered to prevent access from google for images - that would probably cure my situation).

Cheers,
Jeff

[tt]Jeff's Page [/tt][tt]@[/tt][tt] Code Couch
[/tt]
 
BabyJeffy - no, i don't think it's images, it's requesting the same things every day like a screen scraper might. There's no translate in there either, something just doesn't seem right.

Ok, well thanks for the response, i'll look futher into it

cheers
 
My understanding is that Google crawls your site from time to time to build up a list of URLs. There's then a second process of requesting those URLs and indexing their content.

So if the old URLs are in Google's index, it can continue to request them even if there are no longer any live links to those addresses. Presumably, after a few tries and a few 404s they'll age out of the index and you'll stop seeing the requests.

You're using a miniscule amount of bandwidth to return 404s for these page requests, I'd not worry about it.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd
 
Chris is right on the mark. Although it can take many months of 404s to have non-existant URIs removed from the SEs index. I have HTTP Requests from Google, MSN and Yahoo for pages that have been removed from sites 18 months previously.

A quicker way is to 301 the missing URIs to another page with the same or similar content.

Chris.

Indifference will be the downfall of mankind, but who cares?
Woo Hoo! the cobblers kids get new shoes.
People Counting Systems

So long, and thanks for all the fish.
 
Hi

I would try to tell the bot that no way to find that file by sending status 410 Gone instead of 404 Not found. For example with Apache :
Code:
Redirect gone /the/requested/file

Feherke.
 
Yahoo was hammering one of my sites that was modified and had lots of 404 page request for an area that was no longer there. After a few months of near continual requests and getting a 404, I used robots.txt and banned Yahoo from accessing those pages. Most were in a common folder so that made it fairly easy.

xtendscott
Home Improvement Watch | Cryosurgery | Walla Walla Portal
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top