awk can't read record, help! 1

ruxshin · May 10, 2001

I have an input file with millions of records (lines). But my awk script couldn't read the input file, giving me this error:

awk: record `-rw------- 1 sagch...' too long
record number 1746

After I checked, I found a record with blanks in between. I used cat -v to check and found that those blanks are made up of thousands of non-printable ^@ control character.

-rw------- 1 abcde staff 32768 May 14 09:45 /f00a/home/

***thousands of non printable characters here***

09:53 /f00a/home/abcde/.lgctermlogin

What can I do to exclude this record from being read? Or is there a way to detect the error and do something? I can't use VI to edit it because the file is too big (> 100 MB).

Please help.

ruxshin

flogrr · May 10, 2001

Hi ruxshim!

Try this:

awk '{if($0 ~ /\026/) next;print}' yourinput > output

This simply skips any record containing a Control-V
which is the up caret before the at sign. The \026
is the octal code for Control-V.

HTH

flogrr
flogr@yahoo.com

ruxshin · May 10, 2001

Hi Flogrr,

I've tried what you recommended, but still failed. I still got the same error message: record too long.

Could it be that AWK can't read that line at all, so it can't even perform the condition check? Is there a limit to the numbers of character that AWK can handle?

Anyone other there, please help...

Thank you.

ruxshin

flogrr · May 10, 2001

Hi ruxshim,

Yes, awk has limitations on record length.

Try changing "awk" to: "nawk" and running again.

nawk '{if($0 ~ /\026/) next;print}' yourinput > output

If that doesn't work, try finding a binary distribution
of gawk on the net. Gawk does not have arbitrary
limits on records or file sizes, etc.

Let us know how it goes with nawk.

Jesse

flogrr
flogr@yahoo.com

ruxshin · May 10, 2001

Hi Flogrr,

I tried the nawk but the error is still the same, input record too long. Because I'm doing my work by telneting to a remote server, I'm not authorized to install anything on the machine. So I have to forget about the Gawk. I've asked the other party to prepare another input file, which DOESN'T contain the corrupted data.

Thanks anyway.

ruxshin

flogrr · May 10, 2001

Hi ruxshin,

Try one more thing:

tr -d '/\026\100' infile > outfile

This will leave nothing but a blank line in it's place
if it will work for you.

The \100 is the @ sign in octal.

HTH

flogrr
flogr@yahoo.com

ruxshin · May 15, 2001

Hi flogrr,

Thanks for your help. I tried to use your tr command example but it didn't seem to work (probably syntax error). So I looked up on this command and finally got it to worked:

tr -cd '\11\12\40-\176' <$inputfile >$outputfile

For other users' reference,
tr is translate command
-c is to complement the following string
d is to delete the following string
-cd is to delete the complement of the following string

'\11\12\40-\176' string is the printable characters in octal, so the complement of it are all the control characters.

So the command above will strip off all the non-printable or control characters from the input file and print the result to the output file.

Finally, my AWK script can read my records!!!

ruxshin

flogrr · May 15, 2001

Hi ruxshin,

Great! The Unix tools never fail to amaze me with
their versatility - once you understand how they
work. You can put together some very complex
pipelines using nothing but Unix utilities.

See you,
Jesse

flogrr
flogr@yahoo.com

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

awk can't read record, help! 1

ruxshin

Programmer

flogrr

Programmer

ruxshin

Programmer

flogrr

Programmer

ruxshin

Programmer

flogrr

Programmer

ruxshin

Programmer

flogrr

Programmer

Similar threads

Part and Inventory Search

Sponsor