This comes up from time to time. I'd like to ask the group to help me explore an elegant solution to "the problem".
In my line of work, I have to process large text files. The "StreamReader" object does very well, except in one respect: you really can't tell where you're "at" in the file.
This is an issue if you need to do random file i/o. For example, I might be processing a large PostScript file from a customer, comprised of many individual statements (invoice, for example). I may need to extract a single statement, composed of a variable number of pages.
I know how to identify the "starting" record and the "ending" record of a statement. Now, I need to re-position to the starting record, and extract all intervening records down to the last record to a second file.
This should be trivial. When I encounter the "starting" record, note the byte-position. When I encoutner the "ending" record, note the current byte-position. I could then, potentially, Seek() back to the starting position, and Read() the calculated number of bytes.
However: you cannot note the "current byte position" when using StreamReader. You can note the .BaseStream.Position, but that won't help you, since i/o is buffered. You'll get the position of underlying stream alright, but not the "StreamReader" position.
I suppose I could note the .Length of each record, and tally them up, to keep track of my position. Then you have the problem of line-termination characters. Add 1 byte, or 2, to each .Length?
In previous projects, I simply gave up and used FileStream, reading in 8k chunks. I had to create my own methods of breaking the chunks into "records" and handling the situations were the chunk read in a "partial" record. Not very elegant!
Thomas D. Greer
In my line of work, I have to process large text files. The "StreamReader" object does very well, except in one respect: you really can't tell where you're "at" in the file.
This is an issue if you need to do random file i/o. For example, I might be processing a large PostScript file from a customer, comprised of many individual statements (invoice, for example). I may need to extract a single statement, composed of a variable number of pages.
I know how to identify the "starting" record and the "ending" record of a statement. Now, I need to re-position to the starting record, and extract all intervening records down to the last record to a second file.
This should be trivial. When I encounter the "starting" record, note the byte-position. When I encoutner the "ending" record, note the current byte-position. I could then, potentially, Seek() back to the starting position, and Read() the calculated number of bytes.
However: you cannot note the "current byte position" when using StreamReader. You can note the .BaseStream.Position, but that won't help you, since i/o is buffered. You'll get the position of underlying stream alright, but not the "StreamReader" position.
I suppose I could note the .Length of each record, and tally them up, to keep track of my position. Then you have the problem of line-termination characters. Add 1 byte, or 2, to each .Length?
In previous projects, I simply gave up and used FileStream, reading in 8k chunks. I had to create my own methods of breaking the chunks into "records" and handling the situations were the chunk read in a "partial" record. Not very elegant!
Thomas D. Greer