Which XML parser to use?

DStudent · Jan 9, 2009

I am familiar with DOM parsing and somewhat wit SAX parsing and I am unsure if I wish to use either of these.

From what I understand DOM is the only one that will allow manipulation of a xml file. My issue is that I need to be able to modify a file, but the XML file size may be extremelely large. It may range from 100 lines to 10000+ lines.

The thing is, i do not have many changes to make and am trying to avoid the overhead and potential large run-time I may run into with DOM. Is anyone aware of an alternate way I may parse or traverse a XML file, while maintaining the ability to modify it.

Thanks in advance.
Diogo

jmeckley · Jan 10, 2009

at some point you will need to load the entire file and overwrite. if this much information exists in the xml file, I would consider importing the "records" in the xml file into a database and querying the database instead.

Jason Meckley
Programmer
Specialty Bakers, Inc.

DStudent · Jan 12, 2009

Well it will be a stream of xml data, so I have been looking into bulk loading, but from what I can tell, it requires a schema file aswell.

Now i run into the problem of not having a schema and being unable to create one.

Originally the file had a master schema, but due to manipulation of the file to create backwards and forwards relations, one can no longer be created. So I believe this eliminates bulk loading as an option. (Correct me if I am wrong) Thanks for the suggestion. I fear I may have to use DOM [sadeyes]

jmeckley · Jan 12, 2009

not having an xml schema has nothing to do with loading the data into a database. a schema just validate the xml document.
I wouldn't want the db and the xml to know about each other anyway, so my approach would be
1. load xml into an xml reader
2. parse the xml into domain objects (my code)
3. validate the domain objects (my code)
4. pass the domain objects to another object which can save the objects to a database (ORM tool)

what I have here is a clear separation of concerns. the xml doesn't care how it's used. the domain objects do not know how they are populated or what there final location will be. the database doesn't know where the data came from, only that it needs to be saved.

bulk loading is a preformance tuning problem I would worry about after I have the basic mechanics in place. with an ORM like NHibernate it's as simple as setting the BatchUpdate property for the current session.

Jason Meckley
Programmer
Specialty Bakers, Inc.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Which XML parser to use?

DStudent

Programmer

jmeckley

Programmer

DStudent

Programmer

jmeckley

Programmer

Similar threads

Part and Inventory Search

Sponsor