Well, you've got several choices and it depends on what you're doing.
First of all, if ANY byte value is possible, you can't use a delimiter. You'll need to use fixed length records. That's the only way you can make sure your data isn't cutting short a field.
Does your file have to be editable by humans? If so, you need to find a delimiter character that can be typed at a keyboard. Figure out which character isn't needed in the text, and prohibit it. If you choose soemthing like a colon (see /etc/passwd for an example), you'll have to filter it out of the data and prohibit it during entry. Again, depends on how you're getting the data.
If it doesn't need to be editable by humans, then you can use a character that can't be easily typed at a keyboard. See the site
for your menu of characters. There are already four delimiter characters defined. Look for FS, GS, RS, and US. These are Field Separator, Group Separator, Record Separator, and Unit Separator. I'm not completely sure of what their intended purposes are, but you could probably use one of these. You could also use a character in the extended character set, since those probably won't be in your data either.
Assuming that you can only use ASCII printable characters you could use the principle of delimiter stuffing.
Choose you delimiter character (say #) and delimit each record with this character. If this character appears in the normal content of your record then insert a second delimiter character.
When reading the record back, if you see a single instance of the delimiter you know that it is acting as a record delimiter. If you see two consecutive delimiters then you drop one of them and retain the other as being part of the original record.
This is the same principle as the bit-stuffing used in SDLC and HDLC comms protocols.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.