Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Download a website together with images!! Help pls

Status
Not open for further replies.

Fori

Programmer
Jun 18, 2003
84
MT
Hi all

i'm trying to create a small app whihc will download a web page and store all the necessary components which form the page (such as images, sounds etc) into a temporary storage! Downloading the page is no problem as i'm using ftp. but i don't know how to download the relavant images etc!! Is there a way and how can i do it!

Thanks
Nick
 
i use inet to download the html file that easy! even downloading images should be easy! the problem is to download the relevant images of that particular page!

thanks
Nick
 
Best bet would probably be to access the downloaded document through Microsoft's Document Object Model (DOM, the necessary bits of which you can access by adding a reference to the Microsoft HTML Object library) and then walk the documents Tags...
 
eh!!! sorry got me lost can you explain better pls!

Thanks
nick
 
Let's start with DOM...

Here's the complex description:
Here's one summary overview:
And the particular point we are concerned with is "Any HTML or XML element (with the possibility of a few exceptions) will be individually addressable by programming". And why is this relevant? because the IMG tag is an HTML element.

Microsoft provide access to the DOM through the Microsoft HTML Object library, and presents all the component parts of an HTML page (eg tables, frames, body text, header, metatags) as COM objects that have properties, methods and events just like any other VB class.

So, if we can load the source document in such a way as to be able to access it through DOM, we could then look through the collection of tags to find just the IMG ones and then examine the various properties (generaly known as attributes in DOM) to get info such as the URL of the image in question, its name, even its width and height.

here's an example I've slapped together. You will need a form with a command buttin and an Internet Transfer control, and will also need to add a reference to the Microsoft HTML Object library:
[tt]
Option Explicit

Private DocumentFactory As New HTMLDocument

Private Sub Command1_Click()
SavePageToFolder " "c:\testhtml\"
End Sub

Public Function GetWebPage(strURL As String) As HTMLDocument
Set GetWebPage = DocumentFactory.createDocumentFromUrl(strURL, "")
Do Until GetWebPage.readyState = "complete"
DoEvents
Loop
End Function

Public Sub SavePageToFolder(strURL As String, strFolder As String)
Dim myHTMLDoc As HTMLDocument
Dim HTMLTagCollection As IHTMLElementCollection
Dim HTMLImage As HTMLImg
Dim bytearray() As Byte
Dim hFile As Long


Set myHTMLDoc = GetWebPage(strURL)

' get collection of all image tags
Set HTMLTagCollection = myHTMLDoc.documentElement.getElementsByTagName("img")

' Save all images on page to nominated folder
With New FileSystemObject
If Not .FolderExists(strFolder) Then .CreateFolder (strFolder)
For Each HTMLImage In HTMLTagCollection
bytearray() = Inet1.OpenURL(HTMLImage.Attributes("src").Value, icByteArray)
hFile = FreeFile
Open .BuildPath(strFolder, HTMLImage.nameProp) For Binary As hFile
Put hFile, , bytearray
Close hFile
HTMLImage.Attributes("src").Value = .BuildPath(strFolder, HTMLImage.nameProp)
Next
' Save the page to the nominated folder
.OpenTextFile(strFolder & .GetBaseName(strURL) & "." & .GetExtensionName(strURL), ForWriting, True).Write (myHTMLDoc.documentElement.innerHTML)
End With

End Sub
 
Thanks seems cool! i'll check it out and i'll reply!

Tahnks for now
Nick
 
Oh - tek-tips always seems to add unneccessary semi-colons after URLs, so you'll need to take the one that has been inserted in the Command1_Click code
 
ok! thanks a million! if it works i'll .... well i thank you very much!

Cheers
Nick
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top