Googles IP address

Hondy · Nov 4, 2005

Hi

I'm seeing addresses in my server logs that whois says is google, the pages that "google" are trying to access are not on the server any more and haven't been for some time.

If it was someone using google to search for content on the site then the IP address would be of the machine that is googling, not google itself.

Why would I be seeing Googles IP in my logs trying to access pages that are no longer there? They are appearing as bad page requests in my stats software. I could block the IP but that's probably a bad thing right?

Thanks

BabyJeffy · Nov 4, 2005

Probably people surfing at google images. This causes the server to parse some weird params through to your site when the choose to view a larger image (at least I see this on my logs) and effectively requests "bad" urls.

Are you seeing lots of mention of "translate" in those bad urls as well? If so this could be the same thing I experienced. I haven't bothered to prevent access from google for images - that would probably cure my situation).

Cheers,
Jeff

[tt]Jeff's Page [/tt][tt]@[/tt][tt] Code Couch

http://www.codecouch.com/jeff/

[/tt]

Hondy · Nov 4, 2005

BabyJeffy - no, i don't think it's images, it's requesting the same things every day like a screen scraper might. There's no translate in there either, something just doesn't seem right.

Ok, well thanks for the response, i'll look futher into it

cheers

ChrisHunt · Nov 4, 2005

My understanding is that Google crawls your site from time to time to build up a list of URLs. There's then a second process of requesting those URLs and indexing their content.

So if the old URLs are in Google's index, it can continue to request them even if there are no longer any live links to those addresses. Presumably, after a few tries and a few 404s they'll age out of the index and you'll stop seeing the requests.

You're using a miniscule amount of bandwidth to return 404s for these page requests, I'd not worry about it.

-- Chris Hunt
Webmaster & Tragedian
Extra Connections Ltd

ChrisHirst · Nov 4, 2005

Chris is right on the mark. Although it can take many months of 404s to have non-existant URIs removed from the SEs index. I have HTTP Requests from Google, MSN and Yahoo for pages that have been removed from sites 18 months previously.

A quicker way is to 301 the missing URIs to another page with the same or similar content.

Chris.

Indifference will be the downfall of mankind, but who cares?
Woo Hoo! the cobblers kids get new shoes.
People Counting Systems

So long, and thanks for all the fish.

feherke · Nov 4, 2005

Hi

I would try to tell the bot that no way to find that file by sending status 410 Gone instead of 404 Not found. For example with Apache :

Code:

Redirect gone /the/requested/file

Feherke.

http://rootshell.be/~feherke/

xtendscott · Nov 7, 2005

Yahoo was hammering one of my sites that was modified and had lots of 404 page request for an area that was no longer there. After a few months of near continual requests and getting a 404, I used robots.txt and banned Yahoo from accessing those pages. Most were in a common folder so that made it fairly easy.

xtendscott
Home Improvement Watch | Cryosurgery | Walla Walla Portal

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Googles IP address

Hondy

Technical User

BabyJeffy

Programmer

Hondy

Technical User

ChrisHunt

Programmer

ChrisHirst

IS-IT--Management

feherke

Programmer

xtendscott

Programmer

Similar threads

Part and Inventory Search

Sponsor