Fixed length records 1

feherke · Aug 26, 2005

Hi

I use [tt]gawk[/tt] 3.1.1 for Linux. I would like to read from a file which contains fixed length records, without separators. The file could be huge, and probably I do not need all records.

How can I read, lets say, 256 characters from a file, with [tt][g]awk[/tt] ?

Thanks,
Feherke.

http://rootshell.be/~feherke/

vgersh99 · Aug 26, 2005

it must be a trick question if you start a new thread, feherke.

I think the obvious answer is:
If you have gawk, you can use gawk's FIELDWIDTH variable to specify your... well.... the width of your fields. And later on can reference the fields regardless of the field separators.

If you don't have gawk, you can simulate the same behaviour with other awk-s. I have the code for it - if there's a need.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

Annihilannic · Aug 26, 2005

I'm not sure how efficient this would be, but it seems to work. How huge is huge?

Code:

gawk '{for (i=1;i<=length;i+=256) {print substr($0,i,256)}} ' datafile

I tried it on a 500KB file with 1420*256 byte records.

Annihilannic.

feherke · Aug 26, 2005

Hi

Ok, the fields are fixed width too, but the problem is, the whole record is fixed width, without separator. So I can not set [tt]RS[/tt] to anything specific, because there is a possibility that the choosed character to miss completely from that file. In which case the whole file will be read. And it could be huge.

Thanks,
Feherke.

http://rootshell.be/~feherke/

vgersh99 · Aug 26, 2005

this link might be helpful...

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

feherke · Aug 26, 2005

Hi

Annihilannic, your script is reading the whole file, which is in the current test case 8.7 Mbyte and reading was done in 31 seconds. Ok, was faster, if breaks after the first print, but there is no guarantee that the file contains the [tt]RS[/tt], in which case the whole file is read at once, without a chance to break it.

This is the point in which I would like to optimize it.

Thanks,
Feherke.

http://rootshell.be/~feherke/

Annihilannic · Aug 26, 2005

I don't think gawk cares if it is huge... as long as you have enough memory in your system.

Regarding the -mf and -mr options, the gawk man page says "They are ignored by gawk, since gawk has no pre-defined limits."

Annihilannic.

Annihilannic · Aug 26, 2005

Okay, I see where you're coming from. Personally I'd write a C programme instead in that case... or PERL may be more flexible in this regard, but I'm no PERL wizard.

Annihilannic.

feherke · Aug 26, 2005

Hi

That's it, vlad, the [tt]RT[/tt] is what I missed ! More exactly the knowledge about it. Thank you.

Feherke.

http://rootshell.be/~feherke/

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Fixed length records 1

feherke

Programmer

vgersh99

Programmer

Annihilannic

MIS

feherke

Programmer

vgersh99

Programmer

feherke

Programmer

Annihilannic

MIS

Annihilannic

MIS

feherke

Programmer

Similar threads

Part and Inventory Search

Sponsor