Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AWK to categorize records 1

Status
Not open for further replies.

ruxshin

Programmer
Apr 26, 2001
33
FI
Hi,

I have an input file with these lines (thousands of records):

filename projectname owner group month day time age agegroup

It's a modification of the information listed by ls -l. What I want to do is to categorize the records according to project name, and then within each project name, categorize the records according to age group, and get the total number of files.

I'm quite new to AWK and I can't figure out a way to do it. Can anyone help?
 
I made some mistake above. The correction:

I have an input file with thousands of this line:

filename projectname owner group filesize month day time age agegroup

It's a modification of the ls -l command. I want to categorize records by their project name, and then within each project, categorize the records by age group and get the total filesize taken up by that particular project.

I'm quite new to AWK and I can't figure out a way to do it. Can anyone help?
 
Hi ruxshin-

This is not a trivial task for 100% awk, so I used
the shell sort command to prep for the awk
program. No need to write code that already exists!


#!/bin/sh
#
# file: sortgrp
#
# purpose: Sort on fields 2 and 10,
# count file sizes of groups, then, count total files.
#
# usage: sortgrp <infile> <outfile> <CR>
#
# Notes: awk used to format the output and count files.
#
#


sort -k 2,2 -k 10,10 $1 |
awk 'BEGIN {
printf(&quot;\nFile Name\tProject Name\tOwner\t\tGroup\t\tFile Size\tMonth\tDay\tTime\tAge\tAge Group\n\n&quot;)
}

NR == 1 {
prev = $2
}

$2 == prev {
prev = $2
filesize += $5
printf(&quot;%-10s\t%-15s\t%-10s\t%-10s\t%-10s\t%-3s\t%-8s%-6s\t%-3s\t%-15s\n&quot;,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10)
}

$2 != prev {
print &quot;&quot;
print &quot;&quot;
print &quot;Project &quot;prev&quot; has a total file size of: &quot;filesize
print &quot;&quot;
print &quot;&quot;
filesize = 0
printf(&quot;%-10s\t%-15s\t%-10s\t%-10s\t%-10s\t%-3s\t%-8s%-6s\t%-3s\t%-15s\n&quot;,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10)
prev = $2
filesize += $5

} END {

printf(&quot;%-10s\t%-15s\t%-10s\t%-10s\t%-10s\t%-3s\t%-8s%-6s\t%-3s\t%-15s\n&quot;,$1,$2,$3,$4,$5,$6,$7,$8,$9,$10)
print &quot;&quot;
print &quot;Project &quot;prev&quot; has a total file size of: &quot;filesize
print &quot;&quot;
print &quot;&quot;
last = NR
print &quot;Total number of files: &quot;last&quot;\n&quot;}' > $2


Hope this helps you!


flogrr
flogr@yahoo.com

 
Hi Flogrr,

Thanks for your help.

Your script managed to categorize all the records by project name, but what I actually wanted is to categorize by project name and then categorize the records within the project name according to age groups.

e.g.

PROJECT NAME AGE GROUP NUM. OF FILES SIZE
=================================================
bigblue 1 5 12345
2 7 655
5 3 5688
SUBTOTAL: 15 18688

smallred 1 3 2233
3 12 48222
SUBTOTAL: 15 50555
=================================================
TOTAL: 30 69243

Anyway, I've managed to do it and print out a html report, but the script is long (276 lines)!

But still, your sort command is very useful and thanks again.

Ruxshin
 
Hi ruxshim-

Sorry I didn't get the specifications right the first time!

Here is the corrected script to do what you need
that uses about 200 less lines.

sort -k 2,2 -k 10,10 $1 |
awk 'BEGIN {
printf(&quot;\nProject Name\tAge Group\tNumber of Files\tSize\n&quot;)
printf(&quot;========================================================\n\n&quot;)
}

NR == 1 {
project = $2
agegroup = $10
flag = 1
}

$2 == project && $10 == agegroup {
project = $2
agegroup = $10
++count
++countsubtotal
++countgrandtotal
filesize += $5
subtotal += $5
grandtotal += $5
}

$2 == project && $10 != agegroup {

if ( flag ) {
printf(&quot;%-10s\t%-10s\t%-5s\t\t%-10s\n&quot;,$2,agegroup,count,filesize)
flag = 0
}
else {
printf(&quot;%-10s\t%-10s\t%-5s\t\t%-10s\n&quot;,&quot;&quot;,agegroup,count,filesize)
}

count = 1
countsubtotal += count
++countgrandtotal
project = $2
agegroup = $10
filesize = $5
subtotal += $5
grandtotal += $5
}

$2 != project {
printf(&quot;%-10s\t%-10s\t%-5s\t\t%-10s\n&quot;,&quot;&quot;,agegroup,count,filesize)
print &quot;&quot;
print &quot;&quot;
printf (&quot; SUBTOTAL: %-10s\t%-10s\n&quot;, countsubtotal, subtotal)
print &quot;&quot;
print &quot;&quot;
count = 1
countsubtotal = 1
++countgrandtotal
project = $2
agegroup = $10
filesize = $5
subtotal = 1
grandtotal += $5
flag = 1

} END {

printf(&quot;%-10s\t%-10s\t%-5s\t\t%-10s\n&quot;,&quot;&quot;,agegroup,count,filesize)
print &quot;&quot;
printf (&quot; SUBTOTAL: %-10s\t%-10s\n&quot;, countsubtotal, subtotal)
print &quot;&quot;
print &quot;&quot;
print &quot;========================================================&quot;
printf (&quot; TOTAL: %-10s\t%-10s\n&quot;, countgrandtotal, grandtotal)
}' | tee $2

I am providing this in case you or someone else in
this forum has a use for it in the future.



flogrr
flogr@yahoo.com

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top