Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Parsing HTML using DOM 1

Status
Not open for further replies.

Swi

Programmer
Feb 4, 2002
1,968
US
I have seen multiple posts from searching the forum where it is said you can use DOM to parse an HTML file. However, when I look for example in the forum or on the web there do not seem to be many VB6 examples.

If anyone would provide an illustrative example as well as some documentation for parsing HTML using DOM that would be great. Basically, I would like to loop through each table, each tr and each td.

Thanks.

Swi
 
Thanks strongm.

Swi
 
One more question. Why I try to use the code to filter td's I do not get anything returned.

Here is a sample of the HTML:

Code:
Option Explicit

' This version backwards compatible to the XML 2.0 or later library
Private Sub Command1_Click()
    Dim myDOM As DOMDocument
    Dim myItem As IXMLDOMElement
    Dim fso As New FileSystemObject
    Dim Instream As TextStream
    
    Set myDOM = New DOMDocument
    Set Instream = fso.OpenTextFile("\\secpykpw01\RedirectedFolders\\Desktop\data.txt", ForReading)
    myDOM.loadXML Instream.ReadAll

    For Each myItem In myDOM.getElementsByTagName("td")
        With myItem.childNodes
            Debug.Print .Item(0).nodeName, .Item(0).Text
            Debug.Print .Item(1).nodeName, .Item(1).Text
            Debug.Print .Item(2).nodeName, .Item(2).Text
        End With
    Next
End Sub

Code:
<td bgcolor='#FFFFFF' ALIGN=LEFT VALIGN='top' class='results_table_cell' id='results_provider_0' style='background-color:#FFFFFF;'><font color='#000000'><font size='-1'>Labs Inc<BR></font></font></td>
<td bgcolor='#FFFFFF' ALIGN=LEFT NOWRAP VALIGN='top' class='results_table_cell' id='results_address_0' style='background-color:#FFFFFF;'><font color='#000000'><font size='-1'>1234 Any Street<BR>Anytown, US 12345</font></font></td>

Swi
 
Well, the problem is here that the HTML is badly formed. Even though most browsers will display it, DOM will not parse it properly - and more specifically the XML parser won't deal with it at all (DOMDocument supports a property called ParseError, which allows you to check for such parse errors)

Try feeding it:
Code:
<table>
<table>
  <tr>
    <td bgcolor='#FFFFFF' align='left' valign='top' class=
    'results_table_cell' id='results_provider_0' style=
    'background-color:#FFFFFF;'>
      <font color='#000000'><font size='-1'>Labs Inc<br /></font></font>
    </td>
    <td bgcolor='#FFFFFF' align='left' nowrap="nowrap" valign='top'
    class='results_table_cell' id='results_address_0' style=
    'background-color:#FFFFFF;'>
      <font color='#000000'><font size='-1'>1234 Any Street<br />Anytown, US 12345</font></font>
    </td>
  </tr>
</table>

Oh, and possibly modify the code:
Code:
[blue]Private Sub Command1_Click()
    Dim myDOM As DOMDocument
    Dim myItem As IXMLDOMElement
    Dim lp As Long

    Set myDOM = New DOMDocument
    
    myDOM.Load "c:\newtest2.htm" '"\\secpykpw01\RedirectedFolders\\Desktop\data.txt"

    For Each myItem In myDOM.getElementsByTagName("td")
        Debug.Print myItem.nodeName, myItem.Text
    Next
    
End Sub[/blue]
 
Perfect.

Thanks strongm. The HTML I had was malformed and not provided in its entirety.

Thanks again!

Swi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top