Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

hello all, In order to block my si 1

Status
Not open for further replies.

mlmll

Technical User
Dec 14, 2000
15
FR
Hello all,

In want to block access to my site (say if
- the referrer is not in an allowed list
or
- the user-agent is not in an allowed-forbidden list,

For this, I made the following htaccess file :

[tt]
SetEnvIfNoCase Referer ^$ allowed_referer
SetEnvIfNoCase Referer ^ allowed_referer
SetEnvIfNoCase Referer ^ allowed_referer

#block spiders
BrowserMatch ^Mozilla allowed_agent
BrowserMatch ^Opera allowed_agent
#and for "Mozilla/4.0 (compatible; MSIE 5.0; Windows 95) TrueRobot; 1.4" (voila), squash 'em
BrowserMatch TrueRobot forbidden_agent
BrowserMatch Echo2 forbidden_agent


Order Allow,Deny
Allow from env=allowed_referer
Allow from env=allowed_agent
Deny from env=forbidden_agent
[/tt]

Though, when I come to my.site.com through a link from google.com, access is allowed !! How come ?

Thanks *a lot* for any hint.

Cheers,
 
I'm still new to Apache but this could be worth a try. Put a javascript in your page that checks for google as the referer.

Just a thought, I never tried it.
 
Nope, I want this to work wether the client has javascript turned on or not.

And I know it *should* work ! Something's wrong, but what ?
 
Hi,

If you have allow,deny then the default is deny so you only really need to explictly allow the traffic you want to let through. I'm not too sure of the effect of the first line, however.

SetEnvIfNoCase Referer ^$ allowed_referer

Does this mean a null/blank line and, if so, why would you want to allow it ?

Anyway, the way the rules read to me is that anything from mozilla / opera would be allowed through and it would only be for other browsers where the referer restriction applied. Was this what you intended ?

Regards
 
Yes,
[tt]SetEnvIfNoCase Referer ^$ allowed_referer[/tt]
means an empty referrer : cases when the url is typed or selected from bookmarks / favorites. I do want to allow access in this case.

No,
what I want is not what you describe.

In addition to the null referrer case described above, I want to allow access also if the client was on friendly.site.com

If not, deny access.

But still, I want to block all kind of spiders, whetever the referrer. The trick I've found so far is to let only ^Mozilla and ^Opera (100% of my visitors) through, except if it contains TrueRobot or Echo2, which are spiders whose user-agent string are like this :

[tt]Mozilla/4.0 (compatible; MSIE 5.0; Windows 95) TrueRobot; 1.4[/tt]

So I'd be glad to know where the error in my htaccess file is. :)

TIA
 
Hi,







OK - that makes obvious sense on the blank referer - I obviously hadn't read my rfcs closely enough! Anyway, the code to set the environment variables seems OK to me. That therefore makes me wonder if its to do with how apache treats multiple allows. If you have something like :







Order Allow,Deny



Allow from apache.org



Deny from foo.apache.org







Apache clearly must run both rules, i.e. foo.apache.org would pass the first line but then be stopped by the deny.







I therefore think your rules would allow traffic with either 'allowed_referer' set or 'allowed_agent' and would then test the deny and only stop the 'forbidden_agent' stuff, i.e the bots.







So the key question may be are the two 'allow' rules effectively combined (logical AND) or do they act as an OR. You could test this in the google scenario by dropping the allowed_agent rule.







Regards















 
I think ifincham is right. What you have to do is put what ever you want to be the default first, then list what you will allow or deny but not both. Also, if you are using .htaccess, I think you have to set your AllowOverride directive accordingly.
 
Thanks for your hints ifincham and RythmAce

ifincham wrote:
So the key question may be are the two 'allow' rules effectively combined (logical AND) or do they act as an OR. You could test this in the google scenario by dropping the allowed_agent rule.

Tested it: works like you expected (logical OR). So
Code:
Order Allow,Deny
Allow from env=allowed_referer
Allow from env=allowed_agent
Deny from env=forbidden_agent

means:

[tt]allow only if (
( allowed_referer
or
allowed_agent
)
and
NOT forbidden_agent[/tt]
)

(yes, I tested that the and is actually an AND, not an OR)

Duh. In I read
Allow,Deny
The Allow directives are evaluated before the Deny directives. Access is denied by default. Any client which does not match an Allow directive or does match a Deny directive will be denied access to the server.


...would sound like a logical AND, isn't it ? Error in the doc ?

Anyway, anybody knows how to achieve :

[tt]
allow only if (
allowed_referer
and
allowed_agent
and
NOT forbidden_agent
)[/tt]

?

TIA
 
Hi,

I believe you could get round this with another 'SetEnvIf' statement that compared the setting of allowed_referer and allowed_ agent and set another environment variable accordingly. You would then code your allow for the new variable. To quote from the apache docs for mod_setenvif :

"If the attribute name doesn't match any of the special keywords, nor any of the request's header field names, it is tested as the name of an environment variable in the list of those associated with the request. This allows SetEnvIf directives to test against the result of prior matches."

What I'm not too sure of (& I'm not in a position to test right now) is the exact syntax. See --> .

Hope this helps
 
Yessss !!! Right, I managed to do it.

FWIW, here's the final answer:

Code:
#referer & agent filter

SetEnvIfNoCase	Referer ^$	allowed_referer=y
SetEnvIfNoCase	Referer ^[URL unfurl="true"]http://famille\.berne\.free\.fr[/URL]	allowed_referer=y
SetEnvIfNoCase	Referer ^[URL unfurl="true"]http://bldanchald\.free\.fr[/URL]	allowed_referer=y
SetEnvIfNoCase	Referer ^[URL unfurl="true"]http://pmchatelard\.free\.fr[/URL]	allowed_referer=y

#block spiders
BrowserMatch	^Mozilla	allowed_agent=y
BrowserMatch	^Opera	allowed_agent=y
#and for "Mozilla/4.0 (compatible; MSIE 5.0; Windows 95) TrueRobot; 1.4" (voila), squash 'em
BrowserMatch	TrueRobot	allowed_agent=n
BrowserMatch	Echo2	allowed_agent=n



SetEnvIf	allowed_referer y	allow_access
SetEnvIf	allowed_agent y	allow_access
SetEnvIf	allowed_referer ^$	deny_access
SetEnvIf	allowed_agent ^$	deny_access
SetEnvIf	allowed_agent n	deny_access

Order Allow,Deny
Allow from env=allow_access
Deny from env=deny_access

Thanks a lot !!!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top