Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

getTextContent problem (xml) 1

Status
Not open for further replies.

Naug

Technical User
Sep 24, 2004
85
RU
Good day all.

I have an issue with getTextContent. I am using DOM to parse an xml file and in one of my nodes there is a text content which includes &lt; . GetTextContent however returns < which messes up my future use of that text.e.g. The string inside xml looks like so:

tt: java.lang.String = " <p><Bloody</p>"
value: char[] = {char[25]@348}
[0] = ' ' 32
[1] = ' ' 32
[2] = ' ' 32
[3] = ' ' 32
[4] = ' ' 32
[5] = ' ' 32
[6] = ' ' 32
[7] = ' ' 32
[8] = '<' 60
[9] = 'p' 112
[10] = '>' 62
[11] = '&' 38
[12] = 'l' 108
[13] = 't' 116
[14] = ';' 59
[15] = 'B' 66
[16] = 'l' 108
[17] = 'o' 111
[18] = 'o' 111
[19] = 'd' 100
[20] = 'y' 121
[21] = '<' 60
[22] = '/' 47
[23] = 'p' 112
[24] = '>' 62
offset: int = 0
count: int = 25
hash: int = 0

(the line is read directly from xml file so it has some uneeded chars - the ones of interest are 11-20.

When I run getTextcontent on the <p> node however I get the following string:

[0] = '<' 60
[1] = 'B' 66
[2] = 'l' 108
[3] = 'o' 111
[4] = 'o' 111
[5] = 'd' 100
[6] = 'y' 121

as you can see [11] = '&' 38,[12] = 'l' 108,[13] = 't' 116,[14] = ';' 59 have been subbed for [0] = '<' 60

Things get worse as I need to parse some non-standard characters which also get subbed.

What am I doing wrong/how can this be fixed?
 
So if I get this right the solution is to simply filter resulting string and sub all characters I want for their proper escaped versions?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top