Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to download complete website

Status
Not open for further replies.

dschaef

Programmer
Oct 27, 2011
4
CA
Hi, I'm wondering if it is possible to download a complete website (as opposed to just the html of the page) through foxpro. I was hoping URLDownloadToFile would be able to handle this but I haven't been able to figure it out. Any help would be much appreciated.
 
When you say you want to download an entire site, presumably you mean that you want to start from the home page, and download all the pages within the domain that are ultimately linked to the home page, plus all images and other media files.

I can see several issues. If you want to end up with a working copy of the site, you would have to reproduce the directory structure. Also, you would need to retrieve any CSS files, and any JavaScript or similar components.

It's possible that the VFP Web Crawler that Olaf referenced will handle those issues for you. If not, you might be able to modify it to do so.

All this leads to the inevitable question of why you want to do this. It's not illegal to download a site in this way, but it does raise questions about what legitimate use you are going to put it to. Perhaps you could clarify this before we go any further.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips, training, consultancy
 
Hi Mike,

"When you say you want to download an entire site, presumably you mean that you want to start from the home page, and download all the pages within the domain that are ultimately linked to the home page, plus all images and other media files."

To clarify, I didn't mean downloading all of the pages which stem from the home page but all of the files associated with the current page.. i.e. the same as if I was to open the page in chrome and click save as (Web Page, Complete).. so this would save all the images, .js and html embedded via frames.

"All this leads to the inevitable question of why you want to do this. It's not illegal to download a site in this way, but it does raise questions about what legitimate use you are going to put it to. Perhaps you could clarify this before we go any further."

The reason I would like to do this is to retrieve data from a website which is contained in an iframe with a hidden src. An example page: The relevant information does not appear if I simply retrieve the html of the page.
_______________________
 
We excercised how to get at frame source code not long ago. With IE you can automate save as full page. It's somewhere in the forum.

thread184-960191 should give you a partial solution at least, manually choosing to save as mht. I haven't gone through the whole thread, maybe chris has found a solution, nothing is marked as such, though.

Bye, Olaf.
 
Appreciate it Olaf, I will check that out.
 
Thanks for the clarification, Dschaef.

From what you said, it should be simpler than I thought. Probably, the easiest way would be - as Olaf suggested - to automate the process using IE. I know that the Web Browser control lets you execute any command from the IE menu system - by calling the ExecWB method and passing the co-ordinates of the menu item. So presumably you could do that with the File / SaveAs command, programmatically setting the file type to "Web page complete".

Sorry I can't give any more details, but it should be easy to figure out.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips, training, consultancy
 
Just to follow up ...

I've done a quick test, and I can see that calling ExecWB and passing the parameters 4, 1 will save the page as "web page complete". This produces the web page itself, plus a directory containing all the images, etc., which I think is what you want.

Reference:
Let us know if you need more details.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips, training, consultancy
 
Hi Mike,
Thanks that looks very helpful. However when I tried to use it, while it did bring up the save menu it did not save the web page or switch the save type to Complete.


oWeb = CREATEOBJECT('internetexplorer.application')
oWeb.Visible = .T.
oWeb.Navigate(URL)

DO WHILE oWeb.Busy
DOEVENTS
ENDDO

DO WHILE oWeb.ReadyState <> 4
DOEVENTS
ENDDO

oWeb.ExecWB(4,1)
 
Yes, I'm seeing the same thing.

It might be worth experimenting with sending an Enter key, to have the effect of clicking the Save button.

To do that, try something like this:

Code:
owsh = CREATEOBJECT("wscript.shell")
owsh.SendKeys("{ENTER}")

Not sure if you should do that before or after calling ExecWB. Experiment with both. If you do it after, it might be necessary to introduce a delay of a few milliseconds to give the IE window time to appear.

Mike

__________________________________
Mike Lewis (Edinburgh, Scotland)

Visual FoxPro articles, tips, training, consultancy
 
Another option is to use CDO.

Code:
Local lcFileName,lcStr && Variables locales
 Declare Integer ShellExecute In "Shell32.dll" ;
   INTEGER HWnd, ;
   STRING lpVerb, ;
   STRING lpFile, ;
   STRING lpParameters, ;
   STRING lpDirectory, ;
   LONG nShowCmd
 lcFileName = Sys(2015)+'.mht' 
 oMSG = Createobject("CDO.Message") 
 oMSG.CreateMHTMLBody("[URL unfurl="true"]http://www.microsoft.com")[/URL]  
 lcStr = oMSG.getstream 
 lcStr.SaveToFile(lcFileName,1)  
 ShellExecute(0,"Open",lcFileName,"","",0)


Mike Gagnon

If you want to get the best response to a question, please check out FAQ184-2483 first.
ReFox XI (www.mcrgsoftware.com)
 

The date of thread184-960191: makes you realise how time flies when you are having fun. [smile]

Mike's CDO solution worked then and happily still works under XP/Vista/Windows 7.

Windows 8, who knows?

FAQ184-2483​
Chris [pc2]
PDFcommander.com
motrac.co.uk
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top