Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Scraping Web Pages Using the VB WebBrowser Control 1

Status
Not open for further replies.

mojorourke

Programmer
Aug 16, 2001
9
US
Hi:
I'm writing an application that parses the text of an HTML page using VB 6 and the webbrowser control. The page I'm trying to get has two javascript errors - one at the beginning and one at the end - maybe intentional.
What I need is a way to load the page without human interraction - ie, having to click on the error dialog to get rid of it.
Also, I would appreciate any links to information concerning scraping data from web pages, since I will most assuredly being doing more of this.

Thanks,

Mike O.
 
This looks through Scripts collection to replace a function with new code.
Code:
Dim objElement as object
Dim objElements as object
'*****
'* Alert Box
'*****
Dim strFunction As String
strFunction = "function alertFormError(errorMsg){"
strFunction = strFunction & "   window.document.all.ErrorBox.value = errorMsg;"
strFunction = strFunction & "}"
'Scripts collection
Set objElements = mobjDocument.Scripts
If Err.Number <> 0 Then Exit Do
For Each objElement In objElements
    strText = objElement.Text    'Script text
    If InStr(1, strText, &quot;function alertformerror&quot;, vbTextCompare) > 0 Then
        objElement.Text = strFunction    ' Replace it all
        If Err.Number <> 0 Then Exit Do
        Exit For
   End If
Next
 
Thanks for the response. I'm not sure what to do with this code. I get an error as soon as the page loads with a modal dialog that pretty much blocks me from doing anything until I click &quot;N&quot; for no. The page is someone else's - I don't have authority to update it.
 
I &quot;afraid&quot; that On_Load (whatever) may occur before you can do anything. The code above &quot;goes active&quot; when the status changes to &quot;interactive&quot; i.e. page is loaded but images have not all been fetched. The code being replaced is not executed until the Submit button is pressed so I have plenty of time (my code presses the button).
 
Yeah, I needed something that could prevent or trap the javascript errors. I have a kludgy work-around for now. I wrote an activeX Exe that uses SendKeys to send an &quot;N&quot; character every second while the page is loading. This allows me to scrape the data in an unmamed mode, but I can't do anything else with the machine, because it keeps sending the character &quot;N&quot; to whatever has focus at the moment.
Thanks for the response anyway. I might be able to use it somewhere else.
 
I have NO idea wehter this will help but I just came across the Silent property of the WebBrowser control. Apparently if you set this to true then the control cant display any dialog boxes.

I haven't tried it but you may like to give it a go?

elziko
 
The silent property of the webbrowser control suppresses dialogs when set to true - it has no effect on critical errors or security alerts
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top