Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

awk can't read record, help! 1

Status
Not open for further replies.

ruxshin

Programmer
Apr 26, 2001
33
FI
I have an input file with millions of records (lines). But my awk script couldn't read the input file, giving me this error:

awk: record `-rw------- 1 sagch...' too long
record number 1746

After I checked, I found a record with blanks in between. I used cat -v to check and found that those blanks are made up of thousands of non-printable ^@ control character.

-rw------- 1 abcde staff 32768 May 14 09:45 /f00a/home/

***thousands of non printable characters here***



09:53 /f00a/home/abcde/.lgctermlogin

What can I do to exclude this record from being read? Or is there a way to detect the error and do something? I can't use VI to edit it because the file is too big (> 100 MB).

Please help.

ruxshin
 
Hi ruxshim!

Try this:

awk '{if($0 ~ /\026/) next;print}' yourinput > output

This simply skips any record containing a Control-V
which is the up caret before the at sign. The \026
is the octal code for Control-V.

HTH


flogrr
flogr@yahoo.com

 
Hi Flogrr,

I've tried what you recommended, but still failed. I still got the same error message: record too long.

Could it be that AWK can't read that line at all, so it can't even perform the condition check? Is there a limit to the numbers of character that AWK can handle?

Anyone other there, please help...

Thank you.

ruxshin
 
Hi ruxshim,

Yes, awk has limitations on record length.

Try changing "awk" to: "nawk" and running again.

nawk '{if($0 ~ /\026/) next;print}' yourinput > output

If that doesn't work, try finding a binary distribution
of gawk on the net. Gawk does not have arbitrary
limits on records or file sizes, etc.

Let us know how it goes with nawk.

Jesse

flogrr
flogr@yahoo.com

 
Hi Flogrr,

I tried the nawk but the error is still the same, input record too long. Because I'm doing my work by telneting to a remote server, I'm not authorized to install anything on the machine. So I have to forget about the Gawk. I've asked the other party to prepare another input file, which DOESN'T contain the corrupted data.

Thanks anyway.


ruxshin
 
Hi ruxshin,

Try one more thing:

tr -d '/\026\100' infile > outfile

This will leave nothing but a blank line in it's place
if it will work for you.

The \100 is the @ sign in octal.

HTH


flogrr
flogr@yahoo.com

 
Hi flogrr,

Thanks for your help. I tried to use your tr command example but it didn't seem to work (probably syntax error). So I looked up on this command and finally got it to worked:

tr -cd '\11\12\40-\176' <$inputfile >$outputfile

For other users' reference,
tr is translate command
-c is to complement the following string
d is to delete the following string
-cd is to delete the complement of the following string

'\11\12\40-\176' string is the printable characters in octal, so the complement of it are all the control characters.

So the command above will strip off all the non-printable or control characters from the input file and print the result to the output file.

Finally, my AWK script can read my records!!! :)


ruxshin

 
Hi ruxshin,

Great! The Unix tools never fail to amaze me with
their versatility - once you understand how they
work. You can put together some very complex
pipelines using nothing but Unix utilities.

See you,
Jesse

flogrr
flogr@yahoo.com

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top