Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Blocking Google 2

Status
Not open for further replies.

jimoblak

Instructor
Oct 23, 2001
3,620
US
There is an application server at an address like This domain is used by a small group for company operations. It is not intended for the public but it is not a problem if the public stumbles on to it. It is a problem if Google has indexed this page and generates more traffic to the page than the few that might have just stumbled on to it.

While you need authentication to get past this index page, the company would prefer not to have their secure portal displayed on Google. Can anyone suggest options to lessen the accessibility of this index page on search engines?

I have a PHP script that will detect if 'google' is found in the referer. If so, a blank page appears instead. But I'm sure we could do better.

Using httpd.conf or .htaccess, is there a way to detect if the referring link is from Google and then deny access or display some other error so that Google does not care to continue to index this page?
 
It may be a little late for this but you can create a robot.txt file in your site's root directory. A robot.txt files tells bots and crawlers what they may look at and index. The following lines tell all bots to forget about it. You don't want any pages or directories indexed:

Code:
User-agent: *
Disallow: /

some search engines will consider these as dead links next time they update their databases. In the mean time, you can rename your index page to something else the use the Indexes directive in httpd.conf to look for it. You can use any thing you want such as private.html. Apache will look for index pages in the order you have them on the line of the Indexes directive so I would put the new index name first. Nobody will need to type more than the regular url to access the index page. However, if you create a new index.html, it will be displayed if someone or a search engine point to it. You can simply have it display a message such as "This is a private site" or "Go away, there is nothing for you here". If you do not keep an index.html file, it will be assumed a dead link and removed from the search engines.

 
Sorry that directive should have been DirectoryIndex not Indexes.
 
Hi

RhythmAce gave you the solution, I would like to mention only one thing.
jimoblak said:
I have a PHP script that will detect if 'google' is found in the referer. If so, a blank page appears instead.
The correct way to communicate to the robot that it should not ask for that page anymore, is to... tell it.
rfc2616 said:
10.4.11 410 Gone
The requested resource is no longer available at the server and no
forwarding address is known. This condition is expected to be
considered permanent.
In PHP means :
Code:
header("Gone",true,410);
This would be the search engine friendly way. To get them respect our requests we also have to respect them.

Feherke.
 
You could do this in httpd.conf using mod_rewrite

RewriteEngine On
RewriteCond %{REFERER} google NC
RewriteRule * - [G,L]

jeb
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top