WhiteSpace - XSD Vs. XML 1

crtillerson · Nov 6, 2003

I'm trying to understand this issue of white space in XML. Could someone out there validate my understanding?

Background:

I'm trying to write a routine to convert some data stored in flat ASCII text files (fixed length) into XML files for a client. A schema was provided that the XML file must validate against. In the original flat text file, there is no such concept as NULL, only blank spaces where there is "no" data or value provided. In the XSD, there is no attribute specified to preserve spaces. There are several string fields defined in the schema where MinOccurs="0" and the min length is set to 1.

As I understand it, in the case above where all I have is blank spaces out of the flat text data file, I cannot represent that as blank spaces in the XML file. Because there is no space attribute defined in the schema, I cannot use <xml:space="preserve"> in the element tag in the XML. The way I should handle this situation is trim the string from the flat text file, and if the length = 0, do not include that element in the XML file.

Do I understand correctly?

Thanks a bunch, if you can either confirm or increase my understanding.

Clint

chiph · Nov 6, 2003

Actually, XSD's have an attribute named "<xs:whiteSpace value="preserve"/>" which you can apply to an element that holds a string.

Chip H.

If you want to get the best response to a question, please check out FAQ222-2244 first

crtillerson · Nov 6, 2003

Yes, but I don't have control over the schema. The schema is already set. I'm trying to understand how I must represent the data coming out of a fixed-length ASCII text file to comply with the restrictions defined in the schema.

Clint

ChrisHunt · Nov 7, 2003

"I'm trying to understand how I must represent the data coming out of a fixed-length ASCII text file"

What you're doing is representing the data, where it's come from is irrelevant. If a space-filled field in the ASCII file means "no value", then you have to represent "no value" in the XML file - forget about the spaces.

Since the elements in question are defined in the schema as "minoccurs=0 minlength=1", they must either be given a value or be missed out altogether - so miss them out for the space-filled (null) values.

For example:
[tt]
FORENAME MIDDLE SURNAME
---------+---------+----------
GEORGE DUBYA BUSH
HARRY TRUMAN
[/tt]
might map to
[tt]
<name>
<forename>GEORGE</forename>
<middle>DUBYA</middle>
<surname>BUSH</surname>
</name>
<name>
<forename>HARRY</forename>
<surname>TRUMAN</surname>
</name>
[/tt]

-- Chris Hunt

http://www.extracon.com

http://www.mcgonagall-online.org.uk

http://www.napitalia.org.uk

http://www.leicesteryha.org.uk

crtillerson · Nov 7, 2003

Thanks Chris. That's how I understand it. Unfortunately, that makes the conversion more difficult. The "easy" approach to this task was to map all the fields in the original text file with their start and end positions to the XML elements and begin and end tag representation - store all that info in a database and just parse each record, field by field, writing out the info into a new text file (XML) that follows the schema. So, this would just be a simple "read string, parse the string, output a new string" application. If positions 5-9 in the original file were five blank spaces, then I would just output five blank spaces as the value in the XML file as it is created.

This complicates matters because the schema has been structured with many multiple nested groupings. So, I might very well run into a case where there is no data for several elements within one of these multi-nested blocks and the whole block would have to be excluded. This will require a lot of overhead programming and processing.

Are there tools out there that can help with this task rather than such a "brute force" approach that will be hard to maintain as the schema is revised over time?

Thanks again.

Clint

chiph · Nov 7, 2003

The "easy" approach to this task was to map all the fields in the original text file with their start and end positions to the XML elements and begin and end tag representation - store all that info in a database and just parse each record, field by field, writing out the info into a new text file (XML) that follows the schema. So, this would just be a simple "read string, parse the string, output a new string" application. If positions 5-9 in the original file were five blank spaces, then I would just output five blank spaces as the value in the XML file as it is created.

This sounds like you're reading the XML file as if it were a CSV or other flat file. You'll have massive headaches doing this -- as XML files don't always have carriage returns after elements. Both of these are valid XML:

Code:

<ElementA>ABCD</ElementA>
<ElementA>EFGH</ElementA>

And:

Code:

<ElementA>ABCD</ElementA><ElementA>EFGH</ElementA>

What you'll want to use to read the file is a XML parser library. There are two main types: DOM and SAX.

DOM loads the entire document into memory and builds an in-memory object representation of your document. Think of it like the folder-view part of Windows Explorer.

SAX reads the file top-to-bottom, and raises events as it sees elements. A common technique of using it is to build a string as it raises events until it hits the tag that marks the end of your record. You then take what you've read and load it into a DOM & use the object methods to retrieve the data therein.

DOMs are good for smaller files, while SAX can handle XML files of any length (I've read 435mb XML files using it).

You will need to look around for a DOM or SAX parser for your language -- they exist for all mainstream languages, from VBScript to IBM Mainframe COBOL.

Chip H.

If you want to get the best response to a question, please check out FAQ222-2244 first

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

WhiteSpace - XSD Vs. XML 1

crtillerson

Programmer

chiph

Programmer

crtillerson

Programmer

ChrisHunt

Programmer

crtillerson

Programmer

chiph

Programmer

Similar threads

Part and Inventory Search

Sponsor