Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Copying Data from the Web 3

Status
Not open for further replies.

kjoost001

Programmer
Apr 11, 2002
19
US
Hello everyone,
is it possible to write a program to retrieve data directly from a webpage and store it in a flat file. So instead of going to the web page manually, copying text, and pasting it; there is a program to automate the whole process??

Thanks,
kj
 
What you can do is add to your program the Internet Transfer control, and I would add a button and a textbox. Put the desired URL in a textbox, and press the button to effect the download.

In the button event handler:

Private Sub cmdGetWebPage_Click()

Dim webpage As String

webpage = Inet1.OpenURL(txtURL.Text, 0)
Open "c:\webpage.txt" For Output As #1
Print #1, webpage
Close #1

End Sub

This example return the web page as a string, and writes that string to a text file.

You can also have the webpage returned as a byte array. Good Luck
--------------
As a circle of light increases so does the circumference of darkness around it. - Albert Einstein


 
where should this be done, it cant be placed in excel or access, it should be built using VB, correct?
 
I've only done it within a VB application. However, there are many excel and access gurus that contribute here, and any one of them may be able to provide the mechanics on how to perform this function from within that app.
Good Luck
--------------
As a circle of light increases so does the circumference of darkness around it. - Albert Einstein


 
Is there any way that you can not get the html codes but only the text. Like <html> should be eliminated.
 

Ok, for the visual effect of watching and retrieving I will suggest the webbrowser control (WB).
[tt]
Option Explicit

Dim Wait As Boolean, Good As Boolean

Private Sub Command1_Click()

Wait = False
Good = False
WB.Navigate Text1.Text

Do While Wait = False
Doevents
Loop

If Good = True Then Debug.Print WB.Document.body.innerText

End Sub

Private Sub WB_DocumentComplete(ByVal pDisp As Object, URL As Variant)

On Error GoTo WB_DocumentCompleteError

If (Trim(URL) <> &quot;&quot;) And (InStr(1, URL, Text1.Text) > 0) Then Wait = True

Exit Sub
WB_DocumentCompleteError:

MsgBox Err.Description

End Sub

Private Sub WB_TitleChange(ByVal Text As String)

On Error GoTo WB_TitleChangeError

If UCase(Trim(Text)) <> UCase(Trim(&quot;Cannot find server&quot;)) Then Good = True

Exit Sub
WB_TitleChangeError:

MsgBox Err.Description

End Sub
[/tt]

This is not quite complete (close) but it should get well on your way.

Good Luck


 
Another option might be to use the Regular Expression object, then do a global replace with the following pattern, replacing with vbNullString.
Code:
   Dim lRegExp As RegExp
   
   Set lRegExp = New RegExp
   lRegExp.Pattern = &quot;<\w*>
   lRegExp.Global = True
   webPage = lRegExp.Replace(webPage, vbNullString)
   Set lRegExp = Nothing
This should remove the standard html tags from the text. Good Luck
--------------
As a circle of light increases so does the circumference of darkness around it. - Albert Einstein
 
My apologies - the Pattern that I game will get rid of more than just the html tags. A better pattern would be
<\S[^>]+>


Good Luck
--------------
As a circle of light increases so does the circumference of darkness around it. - Albert Einstein
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top