Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

maximum size of XML/Parsing of irregular XML file 1

Status
Not open for further replies.

raja1

Programmer
Aug 10, 2001
3
IN
i am doing data transfer application from one database to another using XML as transfer format.I would like to know if there is any maximum size for an XML file.Further if there is no limitation for size,is there supposed to be any optimal size for XML file, beyond which parsing shall become very slow.I have to transfer data of the order of 150 Mb(data of about 50000 people).The xml format of client seems to be highly irregular,in the sense the xml is not designed properly which makes things worse.Using SAX parser for parsing the XML is complex and is tedious for the kind of format the client is asking for.DOM could be helpful in such cases eventhough it may not offer a total easy solution.But DOM is a slow parser and for parsing a Xml file with data of 50000 people (each person accounts for 750 entries in XML file),I find the application would take a very longtime to parse the file and then do data transfer.Can anybody suggest a solution for this problem?
 
Use Sax for the parser, if the overall layout of each file is about the same(in context). Fear is the mind killer!!!!!!
 
Well, you said "irregular"

Can we assume that the XML is valid at least? Error recovery is a pain and most parsers just quit if it's not valid, which IS allowed in the XML spec.

To me, "irregular" indicates that the individual records 1) contain many different fields but few if any contain all fields or 2) (even worse) contain different fields with the same purpose. Best case is the records have mostly the same fields, but they're just out of order.

An example of 1) could be phone numbers... home, work, cell, pager, voicemail, etc. An example of 2) would be some records have "Address" and others have "Address1" and "Address2" for two line addresses.

Depending on the depth of your nesting, SAX could still work, in which case size is kind of irrelevant. If your xml is similar to the following (i.e. similar to the best case), SAX could just fill in a record while it parses the items inside the Person construct which you could format to your liking and output at the /Person tag.

<People>
<Person>
<FirstName>John</FirstName>
<LastName>Smith</LastName>
<Address>293 Sleepy Lane</Address>
<City>Anytown<City>
<State>PA</State>
</Person>
<Person>
<Address>765 Deadend Rd.</Address>
<LastName>Doe</LastName>
<FirstName>Jane</FirstName>
<State>HI</State>
<City>Honolulu</City>
</Person>
</People>

Rose/Miros
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top