TunaAdmiral
MIS
Hi there!
I haven't had much experience with XML, but a few months ago, I wrote a program that reads in customer-submitted XML file. That particular file contained a large number of ampersands(ex. DiCarlo & Sons Plumbing).
I added a find and replace function to the program that replaced the &'s with &.
Fast forward a few months...a second customer has started submitting XML files to us. Now, I basically took the same program from the first XML file and applied it to the new file. Chaos!
The second XML file contains escaped characters (& instead of &, ' instead of apostrophes). My program is replacing the "&" with &..so the & in the file becomes &. Not good.
Now, it is clear to me that I did a cruddy hack job to process the first file. The second XML file is actually exactly as it should be. When I take out my find & replace function, the program runs beautifully...the & and ' characters are automagically converted to actual & and ' by .NET's wonderful XML methods.
My question is this: What is the best way to deal with a poorly formatted XML file(i.e. one containing stand-alone "&" characters) Is there a set method of dealing with these? How do you differentiate between stand-alone "&" characters, and those contained in an escaped string like "'"?
Any guidance would be appreciated.
- Mikeymac
I haven't had much experience with XML, but a few months ago, I wrote a program that reads in customer-submitted XML file. That particular file contained a large number of ampersands(ex. DiCarlo & Sons Plumbing).
I added a find and replace function to the program that replaced the &'s with &.
Fast forward a few months...a second customer has started submitting XML files to us. Now, I basically took the same program from the first XML file and applied it to the new file. Chaos!
The second XML file contains escaped characters (& instead of &, ' instead of apostrophes). My program is replacing the "&" with &..so the & in the file becomes &. Not good.
Now, it is clear to me that I did a cruddy hack job to process the first file. The second XML file is actually exactly as it should be. When I take out my find & replace function, the program runs beautifully...the & and ' characters are automagically converted to actual & and ' by .NET's wonderful XML methods.
My question is this: What is the best way to deal with a poorly formatted XML file(i.e. one containing stand-alone "&" characters) Is there a set method of dealing with these? How do you differentiate between stand-alone "&" characters, and those contained in an escaped string like "'"?
Any guidance would be appreciated.
- Mikeymac