Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Limit Requests per day by IP

Status
Not open for further replies.

Geronantimo

Technical User
Apr 3, 2006
79
SE
Is it possible to limit the number of requests that are made to a webpage in the course of a day based on IP address and some other information?

I have the following basic tracking script:
Code:
<?php
$agent = $_SERVER['HTTP_USER_AGENT'];
$uri = $_SERVER['REQUEST_URI'];
$ip = $_SERVER['REMOTE_ADDR'];
$ref = $_SERVER['HTTP_REFERER'];
$visitTime = date("r"); //Example: Thu, 21 Dec 2000 16:01:07 +0200
$logLine = "$visitTime - IP: $ip || User Agent: $agent || Page: $uri || Referrer: $ref\n";

$fp = fopen("visitorLog.txt", "a+");
fputs($fp, $logLine);
fclose($fp);
?>
The tracking information ("visitTime" and "ip") are recorded in the text file "visitorLog.txt" and the script above is called in the webpage using
Code:
<?php include "visitorLog.php"; ?>
How can I modify the script to prevent access to the webpage if the visitor has already visited 20 times in the previous 24 hours?

Thanks in advance and I look forward to hearing from you.
 
i would advocate using a database for this kind of logging.
here is some code that should do the necessary, together with a sql create statement for the necessary table. it would be easy to add filters into the code to select only certain data, and to output links rather than text so that the filters could be selected.

if you absolutely have to use a flat file i can probably rewrite the class for you, but the performance will be truly terrible.

Code:
<?php

//this assumes that you have a pre-existing connection to a database

$log = new logger(); //this creates the class and logs the current visit

if ($log->numRecentVisits() > 20){
	die ('Too many visits today'); //this kills the script if there have been too many visits
}

//to output the log
$log = new logger();
$log->outputLog();

class logger{
	
	/*
	 * sql statement for creating the logging table
	 * create table visitorlog
	(	visitID int(10) auto_increment primary key,
		visitIP varchar(15),
		visitURI varchar(255),
		visitReferer varchar(255),
		visitDate int(20),
		visitAgent varchar(255)
	)
	 * 
	 */
	public function __construct(){
		$this->agent = $_SERVER['HTTP_USER_AGENT'];
		$this->uri = $_SERVER['REQUEST_URI'];
		$this->ip = $_SERVER['REMOTE_ADDR'];
		$this->ref = empty($_SERVER['HTTP_REFERER']) ? null : $_SERVER['HTTP_REFERER'];
		$this->visitDate = time();
		$this->logVisit();
	}
	
	private function logVisit(){
		$sql = sprintf("	Insert into visitorlog 
							(visitID, visitIP, visitURI, visitReferer, visitDate, visitAgent) 
							values 
							(null, '%s', '%s', '%s', %d, '%s')", $this->ip, $this->uri, $this->ref, $this->visitDate, $this->visitAgent);
		@mysql_query($sql);
	}
	
	public function numRecentVisits(){
		$dayago = strtotime('-1 day');
		$sql = "Select count(*) as c from visitorlog where visitorIP='$ip' and visitTime>'$dayago'";
		$result = @mysql_query($sql);
		$row = mysql_fetch_assoc($result);
		return $row['c'];
	}
	
	public function outputLog(){
		$sql = "Select date_format('%Y-%m-%d %T', visitDate), visitIP, visitURI, visitReferer, visitAgent from visitorlog order by visitDate asc";
		$result = mysql_query($sql);
		echo <<<HTML
<table border="1">
	<tr>
		<th>Time</th><th>IP Addr</th><th>URI</th><th>Referer</th><th>User Agent</th>
	</tr>
HTML;
		while ($row = mysql_fetch_assoc($result)){
			echo "\r\n\t<tr>\r\n\t\t<td>" . implode('</td>\r\n\t\t<td>', $row) . "</td>\r\n\t</tr>";
		}
		echo "\r\n</table>";
	}
}
?>
 
How can I modify the script to prevent access to the webpage if the visitor has already visited 20 times in the previous 24 hours?

If you are recording the number of visits they have made then you can run something like the following....

IF($user_Count>=20){
header("location: }
ELSE{
... your code ...
}

As jpadie points out, your best bet would be a (My)SQL storage solution. Put the captured info into a DB. You can save inserts by checking to see if the user has already been added to the DB in the first place.

I assume you'll be looking to make allowances for search engines and alike?

Simon Clements-Hawes
 
Are you sure you want ot do it on IP ?, for example if I worled for IBM and used your site I would probabbly present to you the same IP address as the guy next to me or even in New york.
Could you try something around a cookie ?
 
Are you sure you want ot do it on IP ?, for example if I worled for IBM and used your site I would probabbly present to you the same IP address as the guy next to me or even in New york.
Could you try something around a cookie ?"

I was thinking along those lines too. Cookie might not be the best method due to the ability to modify/remove them.

I'm not sure as to why you'd want to exclude people by IP but hey, I'm not the one asking the question ;)

I'd be more inclined to go down the session route myself to achieve something like this.

Simon Clements-Hawes
 
i would not want to use cookie-session based identification as they would not be honoured by spiders and robots (who are the worst offenders in leaching, imo).
 
Thank you all for the insightful replies.

I shall take some time to quietly go through what you have written and then post back.
 
This is an age old question in the web world, I wonder how voting sites get around it, in in fact they do. The virus boys seem to be able to do it via a web page !
I agree cookies are not the best solution as they can be deleted, but you can encrypt the data to stop it being tampered with.
The session thing would be interesting as once the user has logged out (or session expired etc) the session goes as well do offers no protection at all. I'd be interested to understand where your thinking is on this one.
JPadie, can you explian why spiders would be a bad thing here?, I can't think that they would hammer the site or intrude on the application.
I'd be interested to see what you come up with Geronantime !
 
ingresman: I was perhaps wrongly grouping spiders and bots in the same category.

i find that spiders grossly distort my web stats on at least one of my sites. On the average day I get 200-300 spider hits and less than 100 real visitors. I _could_ manually filter them, i could block them completely. I choose to do neither and this means that my stats are very much exaggerated. not terrible, but annoying.

one of the sites i administer receives more than 5000 hits per day from robots trying to spam its forums. this client does not want to implement a captcha system as it raises the entry barrier too high. we have a whole raft of protections in place to stop robot spam, one of which is to block repeated IP hits in a given timescale. This currently works as the robots being employed are not distributed - it's a single origin in each case (changing each day - but always in china, russia or taiwan). this is only one level of protection - there are other more subtle variants that have reduced spam from hundreds per day down to a few per year. Regrettably we lose some genuine comments too, but these help us improve the system too.

But we have not really been given enough information by the OP to determine whether multi-hit limitations is the right approach to resolving whatever issue he faces. I assumed that it was some sort of wrongdoing we were protecting against, and thus cookie based solutions wouldn't wash. There is no ground for this assumption and it may well be that a simple session counter would work equally well. the approach taken by my code could be easily reused for a session based solution.

i've also rewritten the database based code to work on a flat file. i have a feeling that the performance would be terrible though (for anything over a few users). although, actually, i've just thought of a better way to do this... i'll post back if the OP is genuinely after a flat-file based solution...
 
interesting stuff, I must admit I wasn't thinking of spaming bot things, just stoping more than one access like voting site.
To be honest one of the things I like about this site is the number of good thikers that come up wih widely differing solutions to what sometimes looks a very simple task.
 
absolutely - great communities should mirror a good dinner party table, imo... conversations playing off each other to build and retain interest.

wish the virtual wine had the same effect as the real stuff (or do i mean the other way round?)
 
<<<
one of which is to block repeated IP hits in a given timescale
>>>
This is a good idea, in the same spirit I made a flood script that exit() if more than one hit per second
it also work against refresh "like a mad man"!
 
Hi all,

I have enjoyed reading the responses in this thread to my initial question.

jpadie - Thanks for the code you posted earlier. I tested it with a database and I agree that this is far better than using a flat file. When I ran the script, I received one error:

Warning: mysql_fetch_assoc(): supplied argument is not a valid MySQL result resource in /home/user/public_html/logging/logging5.php on line 63

Line 63 is:
Code:
        $row = mysql_fetch_assoc($result);
Also, the table columns "Time" and User Agent" were empty in the page output. The "visitDate" column contains entries in the database, but the "visitAgent" column has no entries.

Reloading the script many times didn't give the message "Too many visits today". I only received that message when I changed
Code:
if ($log->numRecentVisits() > 20){
to
Code:
if ($log->numRecentVisits() >= 0){

jpadie
But we have not really been given enough information by the OP to determine whether multi-hit limitations is the right approach to resolving whatever issue he faces.

I am building a website containing a great deal of proprietary information (It is a type of directory). I would like to give visitors access to this information but I need to protect the information from people who may try to download too much or even the entire site using WGET or some other spidering program. The actual number of pages that they may access before being blocked can be decided at a later stage and the restriction would not be applied to every page on the website - only to the "valuable" pages. Perhaps I am not taking the right approach, but I am exploring the possibilities.

psymonj
I assume you'll be looking to make allowances for search engines and alike?

I do wish to allow search engines to spider the site but this is not the most important consideration.

I have been re-considering my initial concept and I feel that I may use a system that requires visitors to register and log in before they have access to the detailed information. The basic information would then still be available to the search engines.

I have been looking today at a system called SOBI for Joomla that seems as though it will suit my purposes for allowing access to the detailed information only to logged in users.

I now need to look into incorporating a user-logging script that will prevent logged in users from accessing too many pages on the site within a 24 hour period.

Joomla contains a small piece of code for checking that a user is logged in:

Code:
	$user = &JFactory::getUser();
	if ($user->get('gid')) echo 'logged in';


so now I shall try to incorporate a logging script at the top the page.


jpadie
... and it may well be that a simple session counter would work equally well.

ingresman
Are you sure you want ot do it on IP ?,

I would prefer to make the check based on IP address and user ID so that many people could access the detailed information from the same IP address provided that they have logged in separately. The logging system would need to record the userID so that it survives after the visitor has logged out and the session has ended.

I presume that the logging table in the database can be cleared out each day using cron?

jpadie
i've also rewritten the database based code to work on a flat file. i have a feeling that the performance would be terrible though (for anything over a few users). although, actually, i've just thought of a better way to do this... i'll post back if the OP is genuinely after a flat-file based solution...

I am using this and other projects to get to improve my understanding of PHP - the flat-file solution was the first thing that came to mind after the simple tracking script that I had. I'm not specifically after the flat file solution, but I would be interested to see it if you have already prepared it.
 
i've probably got the date_format wrong. replace the method with this instead

Code:
public function outputLog(){
        $sql = "Select visitDate, visitIP, visitURI, visitReferer, visitAgent from visitorlog order by visitDate asc";
        $result = mysql_query($sql);
        echo <<<HTML
<table border="1">
    <tr>
        <th>Time</th><th>IP Addr</th><th>URI</th><th>Referer</th><th>User Agent</th>
    </tr>
HTML;
        while ($row = mysql_fetch_assoc($result)){
        	$row[0] = date("Y-m-d H:i:s", $row[0]);
            echo "\r\n\t<tr>\r\n\t\t<td>" . implode('</td>\r\n\t\t<td>', $row) . "</td>\r\n\t</tr>";
        }
        echo "\r\n</table>";
    }
 
sorry, there were a bunch of bugs. try this code instead

Code:
<?php

//this assumes that you have a pre-existing connection to a database

$log = new logger(); //this creates the class and logs the current visit

if ($log->numRecentVisits() > 20){
    die ('Too many visits today'); //this kills the script if there have been too many visits
}

//to output the log
$log = new logger();
$log->outputLog();

class logger{
    
    /*
     * sql statement for creating the logging table
     * create table visitorlog
    (    visitID int(10) auto_increment primary key,
        visitIP varchar(15),
        visitURI varchar(255),
        visitReferer varchar(255),
        visitDate int(20),
        visitAgent varchar(255)
    )
     *
     */
    public function __construct(){
        $this->agent = $_SERVER['HTTP_USER_AGENT'];
        $this->uri = $_SERVER['REQUEST_URI'];
        $this->ip = $_SERVER['REMOTE_ADDR'];
        $this->ref = empty($_SERVER['HTTP_REFERER']) ? null : $_SERVER['HTTP_REFERER'];
        $this->visitDate = time();
        $this->logVisit();
    }
    
    private function logVisit(){
        $sql = sprintf("    Insert into visitorlog
                            (visitID, visitIP, visitURI, visitReferer, visitDate, visitAgent)
                            values
                            (null, '%s', '%s', '%s', %d, '%s')", $this->ip, mysql_real_escape_string($this->uri), mysql_real_escape_string($this->ref), $this->visitDate, mysql_real_escape_string($this->visitAgent));
        @mysql_query($sql);
    }
    
    public function numRecentVisits(){
        $dayago = strtotime('-1 day');
        $sql = "Select count(*) as c from visitorlog where visitIP='$ip' and visitDate>'$dayago'";
        $result = @mysql_query($sql);
        $row = mysql_fetch_assoc($result);
        return $row['c'];
    }
    
    public function outputLog(){
        $sql = "Select visitDate, visitIP, visitURI, visitReferer, visitAgent from visitorlog order by visitDate asc";
        $result = mysql_query($sql);
        echo <<<HTML
<table border="1">
    <tr>
        <th>Time</th><th>IP Addr</th><th>URI</th><th>Referer</th><th>User Agent</th>
    </tr>
HTML;
        while ($row = mysql_fetch_assoc($result)){
        	$row[0] = date("Y-m-d H:i:s", $row[0]);
            echo "\r\n\t<tr>\r\n\t\t<td>" . implode('</td>\r\n\t\t<td>', $row) . "</td>\r\n\t</tr>";
        }
        echo "\r\n</table>";
    }
}
?>
 
I presume that the logging table in the database can be cleared out each day using cron?

yes - a query like this would be fine

Code:
$cutOffDate = strtotime("-2 days");
mysql_query('delete from visitorlog where visitdate<$cutOffDate');

 
I have a guestbook PHP script and instead of a database I just use a simple data file. This code might help:

Code:
/* Visitor can sign Guestbook only once a day, check user IP & date against IP of today's users, Yes, file has IP & date.
$ip = $_SERVER['REMOTE_ADDR'];
$file=fopen($GBfile,"r") or exit("Unable to open Guestbook.csv file!");
while (!feof($file)) 
	{
		$data = fgetcsv($file, 1000, ",");
		if (($data[3] ==  date("Y-m-d")) && ($data[5] == $ip))
			{
			echo "	ERROR - enough!";
			fclose($file);
			sleep(2); 
			echo '<HTML><head>';
			echo '<META HTTP-EQUIV="Refresh" Content="0; URL=http://website.com/GoAway.htm">';
			echo '</head><body></body></HTML>';
			exit();
			die(); 
			}
	}
fclose($file);
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top