Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Log File Question (apache webserver)

Status
Not open for further replies.

Neo81

Technical User
Aug 16, 2001
154
AU
I am the one who has to look after a small web hosting/isp now since the fella who made it has left the company. The problem is that we run a apache web server to dish out our web pages, but I got asked by the big boss if we could tell if anyone has accessed a site a lot of times in one day or if they have like used net zip to download the whole web page? Is that possible ?

Thanks,

Later
NEo81 >:):O>
 
I have a Perl script at home that could be somewhat useful to you. Whipped it up when the ol' nimda virus was running amok so as to satisfy my curiosity concerning the number of hits from hosts outside my home LAN [I use dialup at home, so any hit to my home web server from outside my LAN was guaranteed to be a virus hit...].

It will only tell you how many times a particular IP hit though ("x.x.x.x hit here y times" is what the output looks like currently), which isn't necessarily what you're looking for (I think any users going through a proxy would all show the same IP, for instance, so a large number of hits from a particular IP could be legitimate).

Knowing the IP's of hosts with excessive hits, you could perhaps grep the logs for that IP and see if the timestamps for page requests are reasonable for human users or whether software was automatically downloading everything. There may even be a way to tell from the browser info whether it was a bot downloading - not sure about that.

All that work with your own eyeballs is pointless though; a shell script or perl script could probably give you a good idea of what the big boss is looking for automagically. You could even schedule it as a cron job and email the big boss with the results if you wanted to.

Email me if you want to see the perl script I mentioned.

[Or maybe someone else knows of a quick, easy, download-and-go utility to do the same thing!] Matt
matt@paperlove.org
If I can help, I will.
 
Thanks for that info, it was handy (I am fairly new to Linux but like that way it works, really interested in getting into it) The reason I am asking this question is that we host a web page for a client and the moment, the client is getting someone else to make a webpage and host it for free?? anyway, we just want to know how much of the webpage we have here has been reused for this so called "New Webpage" we were just wanting to know if this is the case

Later
NEo81 >:):O>
 
For that, I might just go download the "new" web page myself and then use diff to see just how different it is from the page you host. (for plain HTML that's reasonable anyway...if server-side elements are involved (CGI or PHP), that may not help)

Not sure what recourse you'd have though, even if the pages were identical. Did your company design the original page for the customer? If so, the contract involved in the original design may make all the difference... Matt
matt@paperlove.org
If I can help, I will.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top