Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

VB6/XPath XML Shredder speedup

Status
Not open for further replies.

BRW1

Programmer
Jul 16, 2019
2
CA
I'm working with a legacy VB6/Xpath XML shredder to load a SQL Server database. It's working, but it's really slow. I'm looking for suggestions to speed it up.

The xml files I'm working with contain only one set of elements, not multiple sets of elements. For example, by analogy to Microsoft's familiar books.xml sample file, my xml files look like this:

Code:
<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
</bookstore>

My xml files do **not** look like this:

HTML:
<bookstore>
  <book category="cooking">
    <title lang="en">Everyday Italian</title>
    <author>Giada De Laurentiis</author>
    <year>2005</year>
    <price>30.00</price>
  </book>
  <book category="children">
    <title lang="en">Harry Potter</title>
    <author>J K. Rowling</author>
    <year>2005</year>
    <price>29.99</price>
  </book>
  <book category="web">
    <title lang="en">XQuery Kick Start</title>
    <author>James McGovern</author>
    <author>Per Bothner</author>
    <author>Kurt Cagle</author>
    <author>James Linn</author>
    <author>Vaidyanathan Nagarajan</author>
    <year>2003</year>
    <price>49.99</price>
  </book>
  <book category="web" cover="paperback">
    <title lang="en">Learning XML</title>
    <author>Erik T. Ray</author>
    <year>2003</year>
    <price>39.95</price>
  </book>
</bookstore>

As shown in the VB6/XPath pseudo code below, the shredder currently works by sequentially recursively processing each XML file using a Select Case structure within a For/Next structure. That works but it's slow because every element in every XML file is considered. The problem is that I'm working with hundreds of thousands of xml files, each of which contains many hundreds of elements. I'm only interested in about 3 dozen of those hundreds of elements. I know the tags that identify the elements I'm interested in--they're always the same. Are there any obvious ways to speed this up? For example, instead of recursively parsing the entire XML file, can I somehow extract and parse only the 3 dozen or so elements that I am interested in? A complication is that a few of the elements I'm interested in have an indeterminate number of child nodes and I need to extract information from every one of those child nodes.

Code:
Public Sub shredXML(ByRef Nodes As MSXML2.IXMLDOMNodeList)

Dim xNode As MSXML2.IXMLDOMNode

  For Each xNode In Nodes

    If xNode.nodeType = NODE_ELEMENT Then

      Select Case xNode.nodeName

        Case "element1"
          extract stuff from element1 & load into database
        Case "element2"
          extract stuff from element2 & load into database
        Case "element3"
          extract stuff from element3 & load into database
        ...
        Case "elementN"
          extract stuff from elementN & load into database

      End Select

    End If

    If xNode.hasChildNodes Then   'parse xml file
     shredXML xNode.childNodes    'recursively
    End If

  Next xNode

End Sub
 
BRW1,

You can extract all element1, element2, ..., elementN, that may be found in a single document at any depth level with a selectNodes method call for each element.

Code:
elements = Array("element1", "element2", "element3", "elementN")

for each element in elements

  for each node in xmldoc.selectNodes("//" + element)

   extractAndLoad(node)

  end for each

end for each

If this helps your extraction to be more efficient or not will depend on the distribution of the elements in the documents, but it's expected that the DOM can optimize the node selection far better than you can optimize any form of traversing the node tree.
 
atlopes: Thanks very much. I'll pursue your suggestion and post back if (make that when [wink]) I get it working. Will probably take me several weeks.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top