Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

webHTTRack autmate script

Status
Not open for further replies.

allensim81

Technical User
Apr 9, 2008
15
MY
Hi, I am totally new in learning to write an Unix script.
I want to automate the WebHTTrack downloading website process. I had tried to write a simple script and it runs! But i didnt use any variables, the website URL and directory is fixed.

Can anyone guides me step-by-step so that i can automate the webHTTrack and make it to download 20-30 websites?

Following is the simple script that i tried to write:
#!/bin/sh
# purpose: automate webHTTtack downloading process

httrack -q -%i -w /data/websites/Calpoly -n -%P -N0 -s2 -p7 -D -a -K0 -c4 -%k -r1 -%e1 -A25000
echo "download completed"

Appreciate your help & guidiance.
Love
 
If you put the URLs in a file, say urls, you could do something like this:

Code:
#!/bin/sh

while read URL
do
    httrack -q -%i -w $URL -n -%P -N0 -s2 -p7 -D -a -K0 -c4 -%k -r1 -%e1 -A25000
    echo "download from $URL completed"
done < urls

... or something like that.

Annihilannic.
 
Hi, thnaks for your fast reply! U r brilliant man!
I tested the script,it works! but then if i put more than one url address in urls file, it overwrite the previous one.. how can solve this?

Thanks in advance!
Love & Cheers!
 
Hi

Put both the URL and the output path into the urls file :
Code:
[gray]#!/bin/sh[/gray]

[b]while[/b] [b]read[/b] URL path
[b]do[/b]
  httrack -q -[teal]%[/teal]i -w [green][i]"$URL"[/i][/green] -O [green][i]"$path"[/i][/green] -n -[teal]%[/teal]P -N[purple]0[/purple] -s[purple]2[/purple] -p[purple]7[/purple] -D -a -K[purple]0[/purple] -c[purple]4[/purple] -[teal]%[/teal]k -r[purple]1[/purple] -[teal]%[/teal]e1 -A[purple]25000[/purple]
  echo [green][i]"download from $URL completed"[/i][/green]
[b]done[/b] [teal]<[/teal] urls
Code:
[URL unfurl="true"]http://www.www.calpoly.edu[/URL] /data/websites/Calpoly
[URL unfurl="true"]http://example.com/[/URL] /data/websites/example_com
Note, the suggested syntax is based on your example. I not know [tt]httrack[/tt].


Feherke.
 
hi dear feherke,
How are you?
I am stil having problmes with my webhttrack automate script. Ok, the whole process goes like this:
1. User will be downloading/archiving the websites manually, which means, user must key in the urls, dpeth and so on one bye one. And all the downloaded website will be in ths path : /data/websites/website1./data/websites/website2 and so on.
2. Each /data/websites/website1 will have doit.log and inside this doit.log file, there is the httrack command on it.
3. I want my automate script will be able to read the every url form the doit.log for every single website, so that the process can be automate.

How am i going to do this? is this possible?
____________________________________________________________
-q -%i -w -O /data/websites/Sains -n -%P -N0 -s2 -p7 -D -a -K0 -c4 -%k -r5 -%e3 -A25000 -F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F "<-q -%i -w -O /data/websites/ -n -%P -N0 -s2 -p7 -D -a -K0 -c4 -%k -r1 -%e1 -A25000 Mirrored from %s%s by HTTrack Website Copier/3.x [XR&CO'2006], %s -->" +*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -%s -%u
File generated automatically on Mon, 20 Jul 2009 16:51:22, do NOT edit

To update a mirror, just launch httrack without any parameters
The existing cache will be used (and modified)
To have other options, retype all parameters and launch HTTrack
To continue an interrupted mirror, just launch httrack without any parameters
____________________________________________________________

Actually by just add on "httrack" in front of the command tht i found from the doit.log and paste it at the terminal, it will run already...

I am having hard time for the automate script.. can you please help me... Thnaks in advance..
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top