Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to stop malicious web robots

Status
Not open for further replies.

XgrinderX

Programmer
Mar 27, 2001
225
US
Hello!

Our compangy has several web sites that offer users the opportunity to search for and view documents. Looking at our stats recently, it appears that there is at least one robot that is going through our site and incrementing document numbers and grabbing the images.

We'd like to stop this from happening. Can anyone help me with this? I am a lowly ASP programmer and have limited knowledge of server administration and have never dealt with this kind of issue before.

Thanks,

-Greg
 
It is my understanding that the Robots.txt file will only work if the robot is actually looking in that file to see if it is allowed. Is that correct? If so, why would a malicious image stealing bot check to see if it is allowed.

I guess IP ban is possible, but I don't think it's all that difficult to obtain a different IP.

Are we just SOL here?
 
Yep the bot may not honour the robots protocol but there are other ways to stop them sometimes. The genuine imagebots do follow the protocol though.
If it's a scraper run from someone's home machine there isn't a lot you can do, but if it is a genuine bot they have to have a fixed IP so firewall blocking is still an option.

post the UA of the bot(s) that's hitting the site.



Chris.

Indifference will be the downfall of mankind, but who cares?
Woo Hoo! the cobblers kids get new shoes.
Nightclub counting systems

So long, and thanks for all the fish.
 
When I get the UA I will post it here. They just enabled the web log yesterday and as of yet we have not seen any robot like activity since we enabled logging.

Also for clarity, let me explain what we think the robot is doing:

It logs in to the site does a document number search, views the image (which is a TIFF image viewed using the Acordex Java TIFF viewer - presumably screen captures the image, then returns and increments the document number and repeats the process.

So I am pretty sure that this is not any kind of "genuine imagebot" - it is something somebody wrote to specifically get automated copies of these documents.

My boss is really determined to keep this from happening and I cannot find anything anywhere that indicates that we have a chance at all of preventing it.

Thanks for your help.

-Greg
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top