Delimiter problem

nkm · Feb 11, 2003

Hi

This is a simple programming issue. I need to delimit character patterns( any kind of ascii char is possible) in my output file.

How can I use any other character as delimiter as it too could be part of the text.

thanks

SamBones · Feb 11, 2003

Well, you've got several choices and it depends on what you're doing.

First of all, if ANY byte value is possible, you can't use a delimiter. You'll need to use fixed length records. That's the only way you can make sure your data isn't cutting short a field.

Does your file have to be editable by humans? If so, you need to find a delimiter character that can be typed at a keyboard. Figure out which character isn't needed in the text, and prohibit it. If you choose soemthing like a colon (see /etc/passwd for an example), you'll have to filter it out of the data and prohibit it during entry. Again, depends on how you're getting the data.

If it doesn't need to be editable by humans, then you can use a character that can't be easily typed at a keyboard. See the site

http://www.asciitable.com

for your menu of characters. There are already four delimiter characters defined. Look for FS, GS, RS, and US. These are Field Separator, Group Separator, Record Separator, and Unit Separator. I'm not completely sure of what their intended purposes are, but you could probably use one of these. You could also use a character in the extended character set, since those probably won't be in your data either.

Hope this helps.

newmangj · Feb 11, 2003

Assuming that you can only use ASCII printable characters you could use the principle of delimiter stuffing.

Choose you delimiter character (say #) and delimit each record with this character. If this character appears in the normal content of your record then insert a second delimiter character.

When reading the record back, if you see a single instance of the delimiter you know that it is acting as a record delimiter. If you see two consecutive delimiters then you drop one of them and retain the other as being part of the original record.

This is the same principle as the bit-stuffing used in SDLC and HDLC comms protocols.

Cheers - Gavin

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Delimiter problem

nkm

Programmer

SamBones

Programmer

newmangj

Technical User

Similar threads

Part and Inventory Search

Sponsor