Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

any ideas to parse this in C#? 1

Status
Not open for further replies.

Brightspot

Programmer
Aug 22, 2008
5
US
I have an xml file where the nodes I'm interested in are usually formatted this way:

<abc:start>Yesterday</abc:start>

Sometimes, there are nested tags like this:

<abc:start>
<abc:good ctype="link" id="1">
</abc:good>Tomorrow</abc:start>

I'm stuck with a brain block, sorry to say. I just want the words in the start tag without the information from the nested tag, i.e. "Yesterday" and "Tomorrow". Any ideas for just keeping the info and skipping the nested tags? Oh,yeah, I'm a newbie, too.
 
I am using GetElementByTagname. Here is the code:

XmlNodeList elemList = doc.GetElementsByTagName("xlf:start");
for (int i = 0; i < elemList.Count; i++)
{
String originalSourceText = elemList.InnerText;
if (originalSourceText.Contains("<xlf:"))
{
label4.Text = "write something";
}
}

I was expecting originalSourceText to contain something like:

<abc:good ctype="link" id="1">
</abc:good>Tomorrow

Then I could use string functions to strip out what I don't want. What I get is:

\r\n \r\n Tomorrow.

It look likes the tag nested in my abc:start

<abc:good ctype="link" id="1">
</abc:good>

is stripped out and replaced with \r\n.

Is this a function of GetElementsByTagName? Is this what the function will always do? Is there a better function to use?
 
[0] As a minor point, can I suppose what amounts to "xlf" is in fact "abc" in the xml doc?

[1] >I was expecting originalSourceText to contain something like:
Have you tried to write out originalSourceText? It won't contain the abc:good tag! why should it?

[2] If it is that rigid structure, either having an empty abc:good or none of it, you can certainly apply the Trim() method.
>String originalSourceText = elemList[ignore][/ignore].InnerText;
[tt]String originalSourceText = elemList.InnerText.Trim();[/tt]
But here the assumption is _very_ strong; and I wouldn't feel confortable, not at all.
 
[0] Yeah, xlf is really the abc in my example. sorry
[1] When I view originalSourceText in the C# debugger I see either the proper "Yesterday" when the XML does not contain a nested tag or /r/n/r/n Tomorrow when there is a nested tag. You said "It won't contain the abc:good tag! why should it?" That is my question . . . why doesn't it? Isn't it just character data inside my "start" tag?
[2] You're right, I can't be certain of the XML structure of the start tag. BUT, if GetElementsByTagName behavior is to replace all nested tags with whitespace, TRIM might work. So I guess this is the real question.
 
>That is my question . . . why doesn't it?
Well, that is not your private api, is it?
 
>if GetElementsByTagName behavior is to replace all nested tags with whitespace,...
It is not and that is not the reason I was talking about the condition/assumption... You've to read the spec rather than imagining from the literal words of the names of the properties or methods.
 
I went to the try-me at the W3: and nested a tag to see what would happen. I modified:

<p id="main1">The DOM is very useful</p>

to:
<p id="main1"><><g>stuff</g>The DOM is very useful</p>

It displayed:

Second paragraph text: stuffThe DOM is very useful.

The try-me is using x[1].innerHTML to get the text. I'm using .InnerText. Maybe that is the difference. I'm going to look further into this. Any ideas where I can research the actual behavior of these methods?
 
>The try-me is using x[1].innerHTML to get the text. I'm using .InnerText. Maybe that is the difference.
Have to be more vigilant on what subject matter and implementation one is dealing with. c# System.Xml.XmlNode, does it contain an i/InnerHTML property? It does not. How about InnerXML?!
 
I sure meant [tt]InnerX[red]ml[/red][/tt] if my fingers listened to my InnerVoice. And I am glad I chose not to hand out a solution for this thread to begin with. That's the right decision.
 
I think I understand now. InnerText returns the words, the actual content of the tag. InnerXml gives everything from tag start to tag end. Thanks for your help tsuji.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top