Confused about getting data from an XML file 1

PPettit · Jul 23, 2010

I've got a script that checks XML files for errors. One thing that it does is to make sure that there is an attachment file name present. I finally figured out how to make it perform as expected, but I was hoping that someone could help me to understand what I was doing wrong.

In essence, the pertinent XML structure is like so:

Code:

<Invoice>
<filename_attachment pagecount="1">408901.pdf</filename_attachment>
</Invoice>

Occasionally, I need to process old files that are slightly different (no pagecount attribute):

Code:

<Invoice>
<filename_attachment>370000.pdf</filename_attachment>
</Invoice>

Using this:
$xmlFilename_Attachment = [string]$xd.DocumentElement.filename_attachment.InnerText
Old XML returned nothing
New XML returned "408901.pdf"

If I cut off ".InnerText"
Old XML returned "370000.pdf"
New XML returned "System.Xml.XmlElement"

I finally settled on using this:

Code:

$xmlFilename_Attachment = $xd.get_DocumentElement().GetElementsByTagName("filename_attachment").Item(0).InnerText

It seems to work on either version of my XML files.

I'd like to have a better understanding as to why the inclusion/exclusion of ".InnerText" and the "pagecount" attribute screwed things up. Also, I'd like to know if there is a better way to get the file name information.

tsuji · Jul 24, 2010

That can be something that makes users "mad" about. ps seems to have its own simplified native object model (based on dot-notation) on top of what .net framework is prepared to support.

The clue to determine what happens is to see the type. Suppose $xd holds the document with attribute pagecount whereas $xd_old holds without, the "old" kind of document.

[1] $xd
>[tt]$xd.Invoice.filename_attachment.gettype().fullname
[blue]Sytem.Xml.XmlElement[/blue]
[/tt]
And the two data can be retrieved like this (amongst some others).
[tt]
>$xd.Invoice.filename_attachment."#text"
[blue]408901.pdf[/blue]
>$xd.Invoice.filename_attachment.pagecount
1
[/tt]
[2] $xd_old
[tt]>$xd_old.Invoice.filename_attachment.gettype().fullname
[blue]System.String[/blue]
[/tt]
Since it is System.String type, InnerText is not one of its member property! Its "innertext" is in fact directly from that reference.
[tt]>$xd_old.Invoice.filename_attachment
[blue]370000.pdf[/blue]
[/tt]
[3] That is distressing as it seems to build for the purpose of making users easier to process the document and it finishes with making the matter quite a bit of a mess. It seems users should stict to what .net framework support to make the reasoning more consistent.

[4] Furthermore, the model exposed using the dot-notation seems to take the tag names case-insensitive, Invoice, invoice and filename_attachment, Filename_Attachment etc are all accepted graciously. I am not sure it is wise, though.

PPettit · Jul 24, 2010

Thanks, tsuji. I'm going to try and stick to what .Net supports. That seems like the best approach unless I'm just doing something quick from the command line.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Confused about getting data from an XML file 1

PPettit

IS-IT--Management

tsuji

Technical User

PPettit

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor