JustinEzequiel
Programmer
from I thought that just setting the entity dict of the XMLParser instance would be sufficient
but evidently it's not enough.
what am I missing?
I know that I can replace the named entities with their unicode equivalents but that would mess up my
output.
I am asked, among other things, to check if the idxname matches the surname.
If not, then I am to flag an error indicating the line and column numbers
so that they can find and fix the error in the XML file.
(oh and yes I am stuck with python 2.5.4)
Thanks in advance.
Justin
source code
output
but evidently it's not enough.
what am I missing?
I know that I can replace the named entities with their unicode equivalents but that would mess up my
output.
I am asked, among other things, to check if the idxname matches the surname.
If not, then I am to flag an error indicating the line and column numbers
so that they can find and fix the error in the XML file.
(oh and yes I am stuck with python 2.5.4)
Thanks in advance.
Justin
source code
Code:
from xml.etree import ElementTree
from htmlentitydefs import name2codepoint
from StringIO import StringIO
import unicodedata
def getParser():
xp = ElementTree.XMLParser()
for k, v in name2codepoint.iteritems():
xp.entity[k] = unichr(v)
return xp
test = '''<surnamegrp>
<surname print="yes">Muñoz</surname>
<idxname>Munoz</idxname>
</surnamegrp>'''
if __name__ == '__main__':
print 'ntilde' in name2codepoint # True
xp = getParser()
print 'ntilde' in xp.entity # True
print unicodedata.name(xp.entity['ntilde']) # LATIN SMALL LETTER N WITH TILDE
## xp.feed(test)
## e = xp.close()
b = StringIO(test)
t = ElementTree.parse(b, xp)
output
Code:
C:\Users\justin\Desktop>parserTest.py
True
True
LATIN SMALL LETTER N WITH TILDE
Traceback (most recent call last):
File "C:\Users\justin\Desktop\parserTest.py", line 27, in <module>
t = ElementTree.parse(b, xp)
File "C:\Python25\lib\xml\etree\ElementTree.py", line 862, in parse
tree.parse(source, parser)
File "C:\Python25\lib\xml\etree\ElementTree.py", line 586, in parse
parser.feed(data)
File "C:\Python25\lib\xml\etree\ElementTree.py", line 1245, in feed
self._parser.Parse(data, 0)
xml.parsers.expat.ExpatError: undefined entity: line 2, column 23
Code:
>>> import sys
>>> sys.version
'2.5.4 (r254:67916, Dec 23 2008, 15:10:54) [MSC v.1310 32 bit (Intel)]'
[/code