Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fixed length records 1

Status
Not open for further replies.

feherke

Programmer
Aug 5, 2002
9,540
RO
Hi

I use [tt]gawk[/tt] 3.1.1 for Linux. I would like to read from a file which contains fixed length records, without separators. The file could be huge, and probably I do not need all records.

How can I read, lets say, 256 characters from a file, with [tt][g]awk[/tt] ?

Thanks,
Feherke.
 
it must be a trick question if you start a new thread, feherke. ;)

I think the obvious answer is:
If you have gawk, you can use gawk's FIELDWIDTH variable to specify your... well.... the width of your fields. And later on can reference the fields regardless of the field separators.

If you don't have gawk, you can simulate the same behaviour with other awk-s. I have the code for it - if there's a need.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
I'm not sure how efficient this would be, but it seems to work. How huge is huge?

Code:
gawk '{for (i=1;i<=length;i+=256) {print substr($0,i,256)}} ' datafile

I tried it on a 500KB file with 1420*256 byte records.

Annihilannic.
 
Hi

Ok, the fields are fixed width too, but the problem is, the whole record is fixed width, without separator. So I can not set [tt]RS[/tt] to anything specific, because there is a possibility that the choosed character to miss completely from that file. In which case the whole file will be read. And it could be huge.

Thanks,
Feherke.
 
this link might be helpful...

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Hi

Annihilannic, your script is reading the whole file, which is in the current test case 8.7 Mbyte and reading was done in 31 seconds. Ok, was faster, if breaks after the first print, but there is no guarantee that the file contains the [tt]RS[/tt], in which case the whole file is read at once, without a chance to break it.

This is the point in which I would like to optimize it.

Thanks,
Feherke.
 
I don't think gawk cares if it is huge... as long as you have enough memory in your system. :)

Regarding the -mf and -mr options, the gawk man page says "They are ignored by gawk, since gawk has no pre-defined limits."

Annihilannic.
 
Okay, I see where you're coming from. Personally I'd write a C programme instead in that case... or PERL may be more flexible in this regard, but I'm no PERL wizard.

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top