Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How Do I Scan a Webpage for Information

Status
Not open for further replies.

byrne1

Programmer
Aug 7, 2001
415
0
0
US
I am building an app that uses the webbrowser control to navigate to a web page. I want to then scan that web page for certain information that is contained in a table. How can I do that?
 
I don't know...

But I'll be interested to see if anyone else does, so I've marked this for email notification
 
Well, first things first... If you don't actually care about personally viewing the page, then don't bother using a WebBrowser control. Draw a form with two command buttons, and a textbox. Set the textbox to multiline = true.

Add the component Microsoft Internet Transfer Control. Copy this code to the form:

Code:
Private Sub Command1_Click()
    Inet1.OpenURL "[URL unfurl="true"]http://sluggy.com"[/URL]
End Sub

Private Sub Inet1_StateChanged(ByVal State As Integer)
   ' Retrieve server response using the GetChunk
   ' method when State = 12. This example assumes the
   ' data is text.

   Select Case State
   ' ... Other cases not shown.

   Case icResponseReceived ' 12
      Dim vtData As Variant ' Data variable.
      Dim strData As String: strData = ""
      Dim bDone As Boolean: bDone = False

      ' Get first chunk.
      vtData = Inet1.GetChunk(1024, icString)
      DoEvents
      Do While Not bDone
         strData = strData & vtData
         DoEvents
         ' Get next chunk.
         vtData = Inet1.GetChunk(1024, icString)
         If Len(vtData) = 0 Then
            bDone = True
         End If
      Loop
      
      Text1.Text = strData
   End Select
   
End Sub

under command2, put the code to look through the text1.text for what you are looking for. If you need help with that, let me know.

Kevin
 
If you have input boxes on the page you can filled with
this example code:
If WebBrowser1.ReadyState = READYSTATE_COMPLETE Then
With WebBrowser1.Document.Forms(0)
For i = 0 To .elements.length - 1
If Left(.elements(i).Id, 7) = "Name" Then
.elements(i).Value = dtaA.Recordset!Name
Else
.elements(i).Value = dtaA.Recordset!Address
End If
Next
End With

Boban.
 
The first approach did not work. The status codes I recieved, in order, were: 1,2,3,4,5,6,7,8,7,8. Also, my app freezes after the code executes.

The second approach is good but the information I'm looking for is not contained in a form. It's in a table.

Is there a way to save the source of the web browser control to a file? If so, this would make it relatively easy to accomplish what I need to do.

Thanks to everyone for your suggestions and help. I know there has to be a way to do this!
 
I GOT IT!!!

Private Sub GetPage(strUrl As String)
Dim b() As Byte
'Cancel any operations
Inet1.Cancel
'Set protocol to HTTP
Inet1.Protocol = icHTTP
'Set the URL Property
Inet1.URL = strUrl
'Retrieve the HTML data into a byte array.
b() = Inet1.OpenURL(, icByteArray)
'Create a tempory file for the page to reside
Open "c:\temp.txt" For Binary Access Write As #1
Put #1, , b()
Close #1
End Sub
 
Just for kicks. Here is how to get the data inside the BODY with Webbrowser.
mstrHTML = objWebb.Document.body.innerHTML

I believe I was abl to get all HTML except enclosing tags with
mstrHTML = objWebb.Document.All(0)
or was it
mstrHTML = objWebb.Document.All.Tags(0)

I forget because the code I write has to operate under IE 4 and IE 5 so there are "properties" like OuterHTML that are IE 5 only for certain tags.
Generate Forms/Controls Resizing/Tabbing Class
Compare Code (Text)
Generate Sort Class in VB or VBScript
 
byrne1 -
Once you have the contents of the webpage in a byte array, you can copy it to a string variable. And once it's in there, you can use normal string functions on it (instr, mid$, left$, instrrev, etc).

Chip H.
 
Actually, once you convert it to a string, you can use regular expressions. That would be the most efficient.

Kevin
 
Just in case, another way.
From your program issue a SendKeys ^A, ( select all) to the HTML page, followed by a sendkeys ^C (copy) and the text is in the clipboard and available for further processing.
paraic@mindspring.com
 
Some great ideas! I'm glad to see that my simple question has stirred up such great suggestions! Thanks everyone.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top