Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Namespace In Regards to Reading XML With MSXML 1

Status
Not open for further replies.

Glenn9999

Programmer
Jun 19, 2004
2,312
US
After doing the last thing I asked about with XML, I decided to try reading the files back in (as a learning exercise). I got that done successfully with my own output, and a handful of others using SelectNodes() to reach the specific data of interest.

But I ran into a little issue with one file evidently not generated by the reference consumer program I was using to test my writes, where SelectNodes() doesn't return data. Comparison-wise, it's almost exactly the same, except for a reference like this in a couple of the main nodes.

<main-node xmlns="<URL reference>">

I figured out this was a namespace reference (invalid URL btw) and read quite a bit about them in a few references. But from what I can tell, the file doesn't contain any explicit references to the name space or even has the base part defined. Of course, part of the folly of looking at web references is not finding anything too similar to what you see in front of you.

Anyhow, opening the file in a text editor and removing the xmlns references and saving the file resulted in the program being able to read the data successfully.

So what do I need to do to be able to read XML files regardless of the presence of references like this?

 
Glenn9999

Consider this XML document:

XML:
<?xml version="1.0" encoding="UTF-8"?>
<root>
  <elm>
    <part>no explicitly declared namespace</part>
  </elm>
  <elm xmlns="info:whatever" xmlns:ns2="info:this_does_not_have_to_be_a_URL">
    <part>some namespace</part>
    <ns2:part>this does not share the namespace of its parent</ns2:part>
  </elm>
</root>

How does one fetch the contents from the [tt]part[/tt] elements spread through the document without editing the document?

Our XPath expression may look only into the local name of the element, disregarding its namespace. For instance
[ul]
[li][tt]selectNodes("//part").length[/tt] returns 1, because there is only one [tt]part[/tt] element with an empty namespace[/li]
[li]and that's why [tt]selectNodes("//part").item(0).text[/tt] returns "no explicitly declared namespace"[/li]
[/ul]
but
[ul]
[li][tt]selectNodes("//*[local-name() = 'part']").length[/tt] returns 3, because there are three [tt]part[/tt] elements, no matter of which namespace[/li]
[li]and so [tt]selectNodes("//*[local-name() = 'part']").item(2).text[/tt] may return "this does not share the namespace of its parent"[/li]
[/ul]

Of course, this is a bit clumsy, and it will get worse when you have to look into the tree of nodes. For instance, to get the contents of the second [tt]part[/tt] one should have to [tt]selectNodes("//*[local-name() = 'elm']/*[local-name() = 'part']").item(1).text[/tt]. To get a node of a particular name and namespace this way, one would have to check for the namespace also using other XPath function, as in [tt]selectNodes("//*[local-name() = 'elm' and namespace-uri() = 'info:whatever']/*[local-name() = 'part' and namespace-uri = 'info:this_does_not_have_to_be_a_URL']").item(0).text[/tt].

But let's assume there is a reason for the namespace to be declared and present in the document, as normally there is. To avoid the mess above and to deal properly with namespaced documents, we must declare their identifiers in the MSXML SelectionNamespaces property and associate a prefix to each one of them.

[tt].setProperty("SelectionNamespaces", "xmlns:n1='info:whatever' xmlns:n2='info:this_does_not_have_to_be_a_URL'")[/tt]

After this, we can select the different (kind of) [tt]part[/tt] elements by using the prefix that was set in the property (and not whatever prefixes, if any, used in the XML document).
[ul]
[li][tt].selectNodes("//n1:elm/n1:part").item(0).text[/tt] returns "some namespace"[/li]
[li][tt].selectNodes("//n1:elm/n2:part").item(0).text[/tt] returns "this does not share the namespace of its parent"[/li]
[li][tt].selectNodes("//elm/part").item(0).text[/tt] continues to return "no explicitly declared namespace"[/li]
[/ul]
So, summing up, the way to deal with namespaces in XML using MSXML is by setting the SelectionNamespaces property.
 
Thanks, that helped. Got my program reading any and all files now.

 
Glenn9999 said:
namespace reference (invalid URL btw)

It is probably best to consider a namespace as a unique, case-sensitive string. While the XML specification says that a namespace is a URI, the treatment of the character string (case sensitivity, no % escaping) has the practical effect of reducing a namespace to a fancy string. The most recent namespace specification has several deprecations, so it is a work in progress.

I try to use valid URIs but often see odd stuff...

Tom Morrison
Consultant
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top