any ideas to parse this in C#? 1

Brightspot · Aug 22, 2008

I have an xml file where the nodes I'm interested in are usually formatted this way:

<abc:start>Yesterday</abc:start>

Sometimes, there are nested tags like this:

<abc:start>
<abc:good ctype="link" id="1">
</abc:good>Tomorrow</abc:start>

I'm stuck with a brain block, sorry to say. I just want the words in the start tag without the information from the nested tag, i.e. "Yesterday" and "Tomorrow". Any ideas for just keeping the info and skipping the nested tags? Oh,yeah, I'm a newbie, too.

tsuji · Aug 23, 2008

What have you got sofar?

Brightspot · Aug 25, 2008

I am using GetElementByTagname. Here is the code:

XmlNodeList elemList = doc.GetElementsByTagName("xlf:start");
for (int i = 0; i < elemList.Count; i++)
{
String originalSourceText = elemList.InnerText;
if (originalSourceText.Contains("<xlf:"))
{
label4.Text = "write something";
}
}

I was expecting originalSourceText to contain something like:

<abc:good ctype="link" id="1">
</abc:good>Tomorrow

Then I could use string functions to strip out what I don't want. What I get is:

\r\n \r\n Tomorrow.

It look likes the tag nested in my abc:start

<abc:good ctype="link" id="1">
</abc:good>

is stripped out and replaced with \r\n.

Is this a function of GetElementsByTagName? Is this what the function will always do? Is there a better function to use?

tsuji · Aug 25, 2008

[0] As a minor point, can I suppose what amounts to "xlf" is in fact "abc" in the xml doc?

[1] >I was expecting originalSourceText to contain something like:
Have you tried to write out originalSourceText? It won't contain the abc:good tag! why should it?

[2] If it is that rigid structure, either having an empty abc:good or none of it, you can certainly apply the Trim() method.
>String originalSourceText = elemList[ignore][/ignore].InnerText;
[tt]String originalSourceText = elemList.InnerText.Trim();[/tt]
But here the assumption is _very_ strong; and I wouldn't feel confortable, not at all.

Brightspot · Aug 25, 2008

[0] Yeah, xlf is really the abc in my example. sorry
[1] When I view originalSourceText in the C# debugger I see either the proper "Yesterday" when the XML does not contain a nested tag or /r/n/r/n Tomorrow when there is a nested tag. You said "It won't contain the abc:good tag! why should it?" That is my question . . . why doesn't it? Isn't it just character data inside my "start" tag?
[2] You're right, I can't be certain of the XML structure of the start tag. BUT, if GetElementsByTagName behavior is to replace all nested tags with whitespace, TRIM might work. So I guess this is the real question.

tsuji · Aug 25, 2008

>That is my question . . . why doesn't it?
Well, that is not your private api, is it?

tsuji · Aug 25, 2008

>if GetElementsByTagName behavior is to replace all nested tags with whitespace,...
It is not and that is not the reason I was talking about the condition/assumption... You've to read the spec rather than imagining from the literal words of the names of the properties or methods.

Brightspot · Aug 25, 2008

I went to the try-me at the W3:

http://www.w3schools.com/js/tryit.asp?filename=try_dom_tut_getelementsbytagname

and nested a tag to see what would happen. I modified:

<p id="main1">The DOM is very useful</p>

to:
<p id="main1"><><g>stuff</g>The DOM is very useful</p>

It displayed:

Second paragraph text: stuffThe DOM is very useful.

The try-me is using x[1].innerHTML to get the text. I'm using .InnerText. Maybe that is the difference. I'm going to look further into this. Any ideas where I can research the actual behavior of these methods?

tsuji · Aug 25, 2008

>The try-me is using x[1].innerHTML to get the text. I'm using .InnerText. Maybe that is the difference.
Have to be more vigilant on what subject matter and implementation one is dealing with. c# System.Xml.XmlNode, does it contain an i/InnerHTML property? It does not. How about InnerXML?!

tsuji · Aug 25, 2008

I sure meant [tt]InnerX[red]ml[/red][/tt] if my fingers listened to my InnerVoice. And I am glad I chose not to hand out a solution for this thread to begin with. That's the right decision.

Brightspot · Aug 26, 2008

I think I understand now. InnerText returns the words, the actual content of the tag. InnerXml gives everything from tag start to tag end. Thanks for your help tsuji.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

any ideas to parse this in C#? 1

Brightspot

Programmer

tsuji

Technical User

Brightspot

Programmer

tsuji

Technical User

Brightspot

Programmer

tsuji

Technical User

tsuji

Technical User

Brightspot

Programmer

tsuji

Technical User

tsuji

Technical User

Brightspot

Programmer

Similar threads

Part and Inventory Search

Sponsor