Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

WGET is getting my IP blocked on remote server - how to avoid rapid reconnections?

Status
Not open for further replies.

cmayo

MIS
Apr 23, 2001
159
US
I'm using wget to perform a daily incremental backup of the files on a remote webserver. Recently, those backups have been causing the remote server's firewall to block my IP. The hosting provider thinks the block is occuring because the firewall only allows three concurrent connections from an IP and blocks the IP when more than three are attempted.

I don't think wget creates multiple concurrent connections, so my working theory is that a connection error occurs during the wget session, and wget attempts to reconnect to the server while the server is still holding the previous, now defunct, connection open waiting for it to timeout. From the server's point of view, I now have two connections open to my IP, and if another connection error occurs before the first session times out on the server, I've got three connections open and have triggered the IP block. I could be completely wrong, but that's the only theory I can come up with at the moment.

Does anyone have any suggestions how I might avoid this situation? I've checked the wget docs for a "wait before reconnect" switch, hoping I could make wget wait a couple minutes between connection attempts, but have found nothing.

Any ideas or suggestions would be much appreciated.
 
What command line options are you using for [tt]wget[/tt]?

I'm thinking that the recursive download options, while it simplifies getting the site downloaded, are probably opening multiple connections.

There is an option "[tt]--wait=SECONDS[/tt]" to wait a number of seconds between retries. Try that if you haven't.


 
Sambones, thanks for your response. I'm using a very basic mirroring implementation:

wget --mirror --ftp-user=xxxxxx --ftp-password=xxxxxx --verbose --append-output="wget_log.txt" ftp://xxxxxx.com/public_html/

I did see the " --wait=SECONDS" switch, but the docs state that's to "wait the specified number of seconds between retrievals," not between retries. I don't know much about the way wget actually works, but the log suggests that it retrieves and saves a directory listing for each directory it processes. I'm assuming that listing would be counted as a retrieval and since the sync has to run over 20,000 directories, I didn't see that I could have it pause long enough between those directory retrievals to improve my chances of clearing a hung FTP session between retrievals. I can experiment with --wait= and see how it works, but the language of the docs suggests that's not going to help me.

There's a "--waitretry=SECONDS" switch, but the docs say that "Wget will use linear backoff, waiting 1 second after the first failure on a given file, then waiting 2 seconds after the second failure on that file, up to the maximum number of seconds you specify," which suggests that wget would still begin with rapid-fire retries (which would still get my IP blocked), increasing the interval by 1 second per retry up to the number of seconds I specify.

If there were a switch to impose a fixed number of seconds between retries, I think that'd have a chance of working for me, but I don't see any parameters that'd do that.
 
Oops, sorry, the description for "[tt]--wait[/tt]" is "wait SECONDS between retrievals", not retries. That should make [tt]wget[/tt] pause between each file retrieved. I would think this might help. Give it a try.
 
I would check with the hosting provider to see if there are any more suitable methods for backup.


A Maintenance contract is essential, not a Luxury.
Do things on the cheap & it will cost you dear
 
I decided to move from wget to rsync, which seems to have solved the problem. When I originally chose wget, I wasn't able to find a Windows implementation of rsync which didn't require a complete Cygwin installation. I've since found cwRsync which requires only two EXE files and a smattering of DLLs and seems to run very nicely.

Thanks all for the input!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top