Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

StreamReader & File/Record Position

Status
Not open for further replies.

tgreer

Programmer
Oct 4, 2002
1,781
US
This comes up from time to time. I'd like to ask the group to help me explore an elegant solution to "the problem".

In my line of work, I have to process large text files. The "StreamReader" object does very well, except in one respect: you really can't tell where you're "at" in the file.

This is an issue if you need to do random file i/o. For example, I might be processing a large PostScript file from a customer, comprised of many individual statements (invoice, for example). I may need to extract a single statement, composed of a variable number of pages.

I know how to identify the "starting" record and the "ending" record of a statement. Now, I need to re-position to the starting record, and extract all intervening records down to the last record to a second file.

This should be trivial. When I encounter the "starting" record, note the byte-position. When I encoutner the "ending" record, note the current byte-position. I could then, potentially, Seek() back to the starting position, and Read() the calculated number of bytes.

However: you cannot note the "current byte position" when using StreamReader. You can note the .BaseStream.Position, but that won't help you, since i/o is buffered. You'll get the position of underlying stream alright, but not the "StreamReader" position.

I suppose I could note the .Length of each record, and tally them up, to keep track of my position. Then you have the problem of line-termination characters. Add 1 byte, or 2, to each .Length?

In previous projects, I simply gave up and used FileStream, reading in 8k chunks. I had to create my own methods of breaking the chunks into "records" and handling the situations were the chunk read in a "partial" record. Not very elegant!

Thomas D. Greer
 
You can match the start point and end point, and use regexp to grab everything in between.

You can then process everything you need line by line.

My C# skill sare limited being a noob, but thats how I'd do it with bash/ksh/php.

something like
StreamReader sr = new StreamReader(fileName.ToString());
string content = sr.ReadToEnd();
Match s=Regex.Match(content,@"(start_sequence(.*)end sequence)");
return s.Result("$2");

s.Result("$2")now contains only the stuff from between your start record and end record.



______________________________________________________________________
There's no present like the time, they say. - Henry's Cat.
 
Thanks for the reply, but you're mising the point. These are extremely large files, 4-6GB on average. Also, there are MULTIPLE operations I need to perform on these files.

How can one, while using StreamReader, calculate a value to pass back to the FileStream's "Seek" method?



Thomas D. Greer
 
Ah, you're right I missed the (multiple) point.

______________________________________________________________________
There's no present like the time, they say. - Henry's Cat.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top