Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Delimiter problem

Status
Not open for further replies.

nkm

Programmer
May 27, 2001
45
0
0
US
Hi

This is a simple programming issue. I need to delimit character patterns( any kind of ascii char is possible) in my output file.

How can I use any other character as delimiter as it too could be part of the text.

thanks
 
Well, you've got several choices and it depends on what you're doing.

First of all, if ANY byte value is possible, you can't use a delimiter. You'll need to use fixed length records. That's the only way you can make sure your data isn't cutting short a field.

Does your file have to be editable by humans? If so, you need to find a delimiter character that can be typed at a keyboard. Figure out which character isn't needed in the text, and prohibit it. If you choose soemthing like a colon (see /etc/passwd for an example), you'll have to filter it out of the data and prohibit it during entry. Again, depends on how you're getting the data.

If it doesn't need to be editable by humans, then you can use a character that can't be easily typed at a keyboard. See the site for your menu of characters. There are already four delimiter characters defined. Look for FS, GS, RS, and US. These are Field Separator, Group Separator, Record Separator, and Unit Separator. I'm not completely sure of what their intended purposes are, but you could probably use one of these. You could also use a character in the extended character set, since those probably won't be in your data either.

Hope this helps.
 
Assuming that you can only use ASCII printable characters you could use the principle of delimiter stuffing.

Choose you delimiter character (say #) and delimit each record with this character. If this character appears in the normal content of your record then insert a second delimiter character.

When reading the record back, if you see a single instance of the delimiter you know that it is acting as a record delimiter. If you see two consecutive delimiters then you drop one of them and retain the other as being part of the original record.

This is the same principle as the bit-stuffing used in SDLC and HDLC comms protocols.

Cheers - Gavin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top