Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

how to stop web bots from stealing your data and band width?

Status
Not open for further replies.

wvdba

IS-IT--Management
Jun 3, 2008
465
US
Hi.
i have successfully installed a captcha routine to our web page to prevent bots from accessing data vs. people. however i have come across a site that attacks our website with a bot/program, grabs the html/asp code and sends the captcha interface to a turing farm for unlocking. the site, provides the captcha answer within 3-5 seconds and returns the answer to his program. then he executes the rest of the program on the page and gets what he wants. i looked at a turing farm site. they charge $0.0019 per captcha answer. i like to find a routine to distinguish between this bot and a human, which is not being able to do. like hovering the mouse over the answer field, clicking the submit button, or something. any ideas how this can be done? i.e. to differentiate between a human and a bot?
thanks.
 
Can you get the ip address of this nasty bot and block it from your site? You may need to be a bit careful with this because they may be attacking you from various ip addresses and you probably don't want to block too many ip addresses for fear of blocking legitimate customers/users.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
eg like this:
Code:
if request.ServerVariables("REMOTE_ADDR") = "62.69.162.144" then
  response.Redirect "[URL unfurl="true"]http://www.auditmypc.com/freescan/antispam.html"[/URL]
  response.end
end if

(only if their ip is not changing)
and/or a check on HTTP_REFERER. Reject if:
1. Empty Http_Referer
2. Too short
3. Does not contain/start with your domain
4. Does contain the attacking domain
5. ...

Maybe there are other servervariables you can use: HTTP_USER_AGENT?
 
We host our websites internally, so I can block ip addresses at the router. If you can't do that, then you'll probably want to put this code in all of your pages.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
thanks for the replies.
the problem is that their IP changes.i blocked over 40 IP's at one time. he switched to different ones. i am really baffled by this technology. i just have to find a way that a human does, and his program does not, or cannot do. any other suggestions?
 
wvdba,

How do you know so much about how the information is being accessed from your website? Do you know the person who is doing this?

Maybe this forum is a better place to find an answer to your question:

forum83


--------

GOOGLE is a great resource to find answers to questions like "how do i..."


--------
 
vicvirk,
because we have viewed his site and seen that all our content (including images) are in his website. those images are exclusive to our site.
thanks.
 
so you need better (more difficult) captcha pictures.
eg present a picture with many different colored cars, and ask how many red cars there are.
is there a public URL where i can see them?

 
thanks foxbox.
we have tried that. what he does, is he sends the captcha portion to an overseas website where they answer the question as a human would seeit ($0.0019 per shot). then it's sent back within 3-5 seconds and the program continues getting the data. here's the public website:
click on the tab: "daily incarseracions"
click radio button: "I agree with terms and conditions"
click an institution: "south central regional ...."
it will list a few: "view details"
click on: "view detail"
you will come across captcha
enter the number and the detail will show.
 
I once created a text box on a form within a hidden div. Often times the bots will fill in all textboxes based on the code. In your postback section, check to male sure the text box is empty. If it's not empty, it was filled in by a bot.

This may not work, but it's easy enough to try.

-George

"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
You have a very limited number of images (10), they may look difficult, but the attacker only need to tell his program 1 time have each digit looks like.

Now i'm thinking of a random word generator (per request) and generate an on-the-fly gif with that word, of course in a diffuclt font, stripes etc.
 
Can you do something like limit the number of requests allowed per hour per unknown IP? Allowing 2 per hour, for example, might slow your scraper up while still permitting most people to get what they need. You could whitelist known IP's for cops or whoever and give them unlimited access.

 
he's not linking. what he does, is unethical and has caused a lot of complaints from public. he puts it on his website and when you want it removed, he asks for $32.05 to remove the picture. by law, our site is supposed to remove the pic if the person is out, but on his site, it stays forever. his site is: wvjails.info
 
do you own the rights to the photos?

Any way you can sue the guy for stealing them?

You stated he is doing some unethical, is it illegal as well?


--------

GOOGLE is a great resource to find answers to questions like "how do i..."


--------
 
this looks promisinng:

Generate images with letters and numbers to make a CAPTCHA test.

* Completely FREE Classic ASP VBScript.
* The secure code is completely random.
* Dynamic image processing.
* No image file requires.
* No components requires.

3 options:


Fake latin word style
Random numbers style
Random numbers and letters style


and i'm sure you can tweak it into more difficult images..
 
thanks foxbox
those book_id's that you see are not real book_id's, they are encripted by adding a random number to it, and then subtracted the same random number to get the real number when our system goes to retrieve the image based on the book_id.
yes, we do own the right to the pictures. i'm not sure about the illegal part, but our policy/legal requirement is that once a person leaves, his/her photo is not viewable by the public. this guy puts the picture on his site permanently and to remove it, you have pay $32.05.
like i said, captcha is not problem for him. he sends the page source with the captcha image/number/letters to an overseas turing farm. the sweat shop enters the answer, ships the data back to him, and the page goes on to retrieve the data and image. captcha sets variable answer to "correct" or "incorrect". no matter how complicated the captcha is, he gets around it by shipping it to a sweat shop of humans who return it. what i need is something like, did he hover his mouse over the captcha answer field and type? and things like that. if the hover thing works, we have something.
our code after captcha is:
Code:
if answer <> "correct" then 
   response.end
end if



 
no matter how complicated the captcha is, he gets around it by shipping it to a sweat shop of humans who return it"

If you move to a registered users only approach, you can monitor and log user actions, but what stops him for requesting a new user for each inquiry? of course you do not want to inspect every request. Even if you start asking money it will not stop him, because he will simply pay.
Because he is asking 32USD per removal, he can affort to pay a lot more than $0.0019 per shot.

.. so forget putting effort into this. you should surely try legal action agains him and his provider.


 
You could make a flash file that is dynamically generated through an XML file, he won't be able to download it BUT can still take screen captures and grab the image that way...




--------

GOOGLE is a great resource to find answers to questions like "how do i..."


--------
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top