getTextContent problem (xml) 1

Naug · Mar 28, 2005

Good day all.

I have an issue with getTextContent. I am using DOM to parse an xml file and in one of my nodes there is a text content which includes < . GetTextContent however returns < which messes up my future use of that text.e.g. The string inside xml looks like so:

tt: java.lang.String = " <p><Bloody</p>"
value: char[] = {char[25]@348}
[0] = ' ' 32
[1] = ' ' 32
[2] = ' ' 32
[3] = ' ' 32
[4] = ' ' 32
[5] = ' ' 32
[6] = ' ' 32
[7] = ' ' 32
[8] = '<' 60
[9] = 'p' 112
[10] = '>' 62
[11] = '&' 38
[12] = 'l' 108
[13] = 't' 116
[14] = ';' 59
[15] = 'B' 66
[16] = 'l' 108
[17] = 'o' 111
[18] = 'o' 111
[19] = 'd' 100
[20] = 'y' 121
[21] = '<' 60
[22] = '/' 47
[23] = 'p' 112
[24] = '>' 62
offset: int = 0
count: int = 25
hash: int = 0

(the line is read directly from xml file so it has some uneeded chars - the ones of interest are 11-20.

When I run getTextcontent on the <p> node however I get the following string:

[0] = '<' 60
[1] = 'B' 66
[2] = 'l' 108
[3] = 'o' 111
[4] = 'o' 111
[5] = 'd' 100
[6] = 'y' 121

as you can see [11] = '&' 38,[12] = 'l' 108,[13] = 't' 116,[14] = ';' 59 have been subbed for [0] = '<' 60

Things get worse as I need to parse some non-standard characters which also get subbed.

What am I doing wrong/how can this be fixed?

prosper · Mar 28, 2005

homepages.wmich.edu/~p1bijjam/595/Esc-seq.pdf

Naug · Mar 29, 2005

So if I get this right the solution is to simply filter resulting string and sub all characters I want for their proper escaped versions?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

getTextContent problem (xml) 1

Naug

Technical User

prosper

Programmer

Naug

Technical User

Similar threads

Part and Inventory Search

Sponsor