Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Request for Advice on Appropriate Use of 'Sleep' Function

Status
Not open for further replies.

canguro

Programmer
Sep 15, 2002
57
0
0
US
Hello,

I am writing a couple of Perl programs that I hope to use periodically to retrieve data from specific websites. I know that it would be bad practice to fire off a lot of hits against websites, because it would compromise the servers of those websites and probably other clients trying to access the websites.

It is not my intention to do anything outrageous but even so I think I should think about using the 'Sleep' function. Can someone give me a rule of thumb or some way to gauge how and when I should use Sleep?

Many thanks in advance.
 
the syntax is:

sleep (seconds);

You can see my post below intitled:

ZZZZZZZZZZZZZ...Sleep command

which my give you some other ideas...

HTH
 
Stiddy,

Thanks for your reply. The syntax is clear, however I am hoping to get an understanding on when Sleep should be used, in other words to avoid stressing a website that I am retrieving data from. I read your post entitled ZZZZZZZZZZZZZ...Sleep command but really all the replies deal with the 'how' --- not the 'when' and 'why'.

If you or anyone else can clear up my head on the appropriate time to use Sleep I would appreciate it greatly.

Joe

 
Well, at least the way I use sleep, is to pause temp either waiting for some process to finish or lengthen a script. For example, if I have a script that runs 24 hours a day, but I only want it to perform a particular process at 1pm, 3,5,7,9.....then sleep until the time is equal to 1:00p.m and do process balh blah. After process blah blah is complete, then sleep until time is equal to 3:00p.m then do blah.

HTH
 
How about putting the scripts in a cron job (or WinAT on NT), and let the operating system look after the work

HTH
Paul
 
Stiddy and PaulTEG,

Thank you both for your practical suggestions. I guess what I have been trying to say in my post is that as a person new to Perl and hoping to write scripts that I will run periodically against certain websites I just don't know when my scripts become a 'burden'. In other words, if I have a script that runs for 10 minutes or half an hour does it burden the website and should I then use 'sleep' in my program to be more responsible?

Going back to what I said before, I have to be aware that a website should not be stressed by my scripts, and that there may also be other individuals/companies trying to access the site.

In summary, are there any rules of thumb for when to use 'sleep'?

Again thanks for the ideas.

Joe

 
Are you on a server farm, or inside a LAN?

If so, have your scripts run on another CPU, but accessing the files on the site, and don't worry about sleep at all.

What are your scripts supposed to be doing, MIS Reports? Depending on your application, you could copy the data off to another server, and work on it there, that way, you wont ahve any load on the server(web) attributable to you

HTH
Paul
 
Actually, the 2-3 scripts I plan to write will be for my own use. I am hoping to retrieve stock info data periodically, such as Earnings announcements and historical quotes, massage them into an appropriate (CSV) format to be saved on the hard drive on my desktop, and then import them into Oracle tables where I can do further analysis.

All of this will be on my own humble standalone P4 desktop, on my own time at home. I am starting to get the idea that my scripts --- not being commercial and not being on LAN's or WAN's --- may not be hogs after all. Does this seems right?

Joe
 
Joe,

It's a case of suck it and see;

See if the data comes back fairly quick, and it should. If you're using LWP, you're effectively scraping pages that would be the same as a regular browser. It depends on the number of hits you intend to make, and how periodic is periodic.

If you think its a severe load on a server, and because you're playing with Oracle, you'd know it probably is.

If it is a severe load, the admins won't be long in tracking you down, and maybe blocking you at a firewall

Hope that helps
Paul
 
Paul,

Your comments are making the situation a bit clearer for me. Yes, I am using LWP. The 'periodic' in my case would be maybe 2 or 3 times in a day. By 'hits' I assume you mean how many 'gets' my script does on the server's URL in 1 execution of the program? What would be an unreasonable number of gets? 10,000? 100,000? Are there companies which run scripts like that?

I have not written the scripts yet, but my mental ballpark says I may do a few hundred gets at the most. Still, I thought I should be responsible by inserting 'sleep' commands in the scripts.

What's your take on this?

Thanks again,

Joe

 
If you're going to do it anyway, just schedule the 'gets' to run at what you might consider a quiet time, when the server wouldn't be under much load.

Sleep isn't really going to help, as the latency of the internet, and your DB imports will probably allow the server time to recover from the hits

You'll hear about it pretty quick if the webadmins aren't happy - Scraping means you don't have to be subjected to their ridiculous advertising banners and such, which is how the company think they earn their money to fund the site

When coding your scripts think about doing as much processing inline rather than scrape all, and process locally. This will give the server time to recover

HTH
Paul
 
Paul,

Your points are well taken, especially the last one. Thank you for the very interesting insights into some of the things involved in these processes.

By the way, was my idea --- that a 'hit' can be loosely thought of as a 'get' --- close to the concept?

Joe
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top