Awk: counting (large?) file rows and columns 1

Toby2007 · Nov 1, 2007

Hi,

I have written a one-liner script to count the # of rows and coulmns for a file, like this:
% more dim
if [ $# -eq 0 ]
then
echo "Usage: dim filename"
exit
fi
awk 'BEGIN{FS="\t"} {nfe=NF} END{print NR " " nfe}' $1

This works fine on small files. e.g.

% dim smallfile.ped
2300 3481

However, it does not appear to work for larger files. e.g. for an example file with 2300 rows and 20924 columns ("largefile.ped") i obtain the following:
% dim largefile.ped
2301 0

Could anyone suggest what is going wrong here? It appears to me that awk is counting an extra line & therefore the built-in variable NF counts zero fields for this empty record. However, i know the file has 2300 records, not 2301 - i have checked usiing vi!

Any advise would be appreciated,

Toby

feherke · Nov 1, 2007

Hi

I created a test file of 252 Mb with 2300 rows and 20924 columns. And your [tt]awk[/tt] script displayed correct result. I used [tt]gawk[/tt]. What kind of [tt]awk[/tt] implementation you used ?

Anyway, in some [tt]awk[/tt] implementations there is no guarantee that the [tt]NR[/tt] variable will keep its value in the [tt]END[/tt] block. And setting a variable for each row like you already do for [tt]NF[/tt], is just a waste of time resources.

I think that task can be better and faster solved with a dedicated counting utility, like [tt]wc[/tt] :

Code:

[gray]# lines[/gray]
wc -l < "$1"
[gray]# columns[/gray]
head -1 "$1" | tr '\t' '\n' | wc -l

Feherke.

http://rootshell.be/~feherke/

Toby2007 · Nov 1, 2007

Thanks feherke,

I'm running awk that's supplied with Mac OS X 10.4.10. I think there was some hidden junk at the end of my file or something, because when i re-generated the input file, the awk script worked.

I agree a dedicated counter would be better than my crummy slow code!
wc works, but i receive the following error for the head command:

% head -l largefile.ped | tr '\t' '\n' | wc -l
head: illegal option -- l
Usage: head [-n lines] [file ...]
0

p5wizard · Nov 1, 2007

That's "head dash one"

[tt]head -1 largefile.ped[/tt]

show first line of file largefile.ped

HTH,

p5wizard

Toby2007 · Nov 4, 2007

Doh!

Thanks p5wizard & feherke. The code works much faster.

Toby

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Awk: counting (large?) file rows and columns 1

Toby2007

Technical User

feherke

Programmer

Toby2007

Technical User

p5wizard

IS-IT--Management

Toby2007

Technical User

Similar threads

Part and Inventory Search

Sponsor