Good day all.
I have an issue with getTextContent. I am using DOM to parse an xml file and in one of my nodes there is a text content which includes < . GetTextContent however returns < which messes up my future use of that text.e.g. The string inside xml looks like so:
tt: java.lang.String = " <p><Bloody</p>"
value: char[] = {char[25]@348}
[0] = ' ' 32
[1] = ' ' 32
[2] = ' ' 32
[3] = ' ' 32
[4] = ' ' 32
[5] = ' ' 32
[6] = ' ' 32
[7] = ' ' 32
[8] = '<' 60
[9] = 'p' 112
[10] = '>' 62
[11] = '&' 38
[12] = 'l' 108
[13] = 't' 116
[14] = ';' 59
[15] = 'B' 66
[16] = 'l' 108
[17] = 'o' 111
[18] = 'o' 111
[19] = 'd' 100
[20] = 'y' 121
[21] = '<' 60
[22] = '/' 47
[23] = 'p' 112
[24] = '>' 62
offset: int = 0
count: int = 25
hash: int = 0
(the line is read directly from xml file so it has some uneeded chars - the ones of interest are 11-20.
When I run getTextcontent on the <p> node however I get the following string:
[0] = '<' 60
[1] = 'B' 66
[2] = 'l' 108
[3] = 'o' 111
[4] = 'o' 111
[5] = 'd' 100
[6] = 'y' 121
as you can see [11] = '&' 38,[12] = 'l' 108,[13] = 't' 116,[14] = ';' 59 have been subbed for [0] = '<' 60
Things get worse as I need to parse some non-standard characters which also get subbed.
What am I doing wrong/how can this be fixed?
I have an issue with getTextContent. I am using DOM to parse an xml file and in one of my nodes there is a text content which includes < . GetTextContent however returns < which messes up my future use of that text.e.g. The string inside xml looks like so:
tt: java.lang.String = " <p><Bloody</p>"
value: char[] = {char[25]@348}
[0] = ' ' 32
[1] = ' ' 32
[2] = ' ' 32
[3] = ' ' 32
[4] = ' ' 32
[5] = ' ' 32
[6] = ' ' 32
[7] = ' ' 32
[8] = '<' 60
[9] = 'p' 112
[10] = '>' 62
[11] = '&' 38
[12] = 'l' 108
[13] = 't' 116
[14] = ';' 59
[15] = 'B' 66
[16] = 'l' 108
[17] = 'o' 111
[18] = 'o' 111
[19] = 'd' 100
[20] = 'y' 121
[21] = '<' 60
[22] = '/' 47
[23] = 'p' 112
[24] = '>' 62
offset: int = 0
count: int = 25
hash: int = 0
(the line is read directly from xml file so it has some uneeded chars - the ones of interest are 11-20.
When I run getTextcontent on the <p> node however I get the following string:
[0] = '<' 60
[1] = 'B' 66
[2] = 'l' 108
[3] = 'o' 111
[4] = 'o' 111
[5] = 'd' 100
[6] = 'y' 121
as you can see [11] = '&' 38,[12] = 'l' 108,[13] = 't' 116,[14] = ';' 59 have been subbed for [0] = '<' 60
Things get worse as I need to parse some non-standard characters which also get subbed.
What am I doing wrong/how can this be fixed?