Report formatting query

ranjit · Dec 21, 2002

I have an input file (refer to extract below) which I am
attempting to convert into an output file as set-out below.

Is this a task suitable for awk to work on? I have attempted
using an approach which searches for the expression "pool"
and gets the subsequent TAPE lines - counting each occurance
under the respective pool heading.

I've ran into problems and do not get the desired result - in particular the "zero" tape occurances are proving problematic to work around.

Can any one help? Help would be appreciated.

Thanks in advance.

=====================================
INPUT DATA FILE:

TAPE SILO MEDIA INVENTORY
---------------------------------

SALES_BACKUP pool

TAPE0050
TAPE0059
TAPE0072

DATABASE_LOG pool

FINANCE pool

TAPE0426
TAPE0021

SCRATCH pool

TAPE0153
TAPE0155
TAPE0159
TAPE0162

DEVELOPMENT pool

TEST pool

====================================

DESIRED OUTPUT FILE:

SALES_BACKUP pool: 3
DATABASE_LOG pool: 0
FINANCE pool: 2
SCRATCH pool: 4
DEVELOPMENT pool: 0
TEST pool: 0

====================================

menski · Dec 21, 2002

I haven't tested this - and I have to confess that it is unusual for my scripts to run straight out of the box - but something like this ought to work (where "input.file" is the name of your data file):

BEGIN {
tcount = 0
while ((getline < "input.file&quot

> 0) {
if ($0 ~ /pool$/) { # lines ending with 'pool'
if (pcount != "&quot

{
printf("%6s",tcount"\n&quot

tcount = 0
}
pcount++
printf("%-20s",$0)
}
else if ($0 != "&quot

{
tcount++
}
}
close("input.file&quot

} # --------------------close BEGIN

You will have to do something about the title line - this will confuse the program. You may also want to tidy-up the printf statement. If you need any help, I can thoroughly recomend the excellent Gawk info file which can be found all over the place on the net but you might like to look at:

http://www.greenbush.com/cgi-bin/info2www?(gawk)

a very comprehensive document with lots of worked examples.

marsd · Dec 21, 2002

Some tips I have learned on this forum and CLA..

1) If you are going to use a BEGIN action for this
and load the file without invoking gawks powerful
main() defined behavior then it is best to load the
file into an array and do your processing:

function processIt(arr,c,max, u,rec) {
if (arr[c] ~ /.*pool/) {
u = c + 1
while (arr !~ /.*pool/ && u <= max) {
if (arr ~ /TAPE.*/) {
rec++
}
u++
}
return "Matches for " arr[c] "=" rec
}
return "NOMATCH"
}

BEGIN {
fname = ARGV[1]
cnt = 1
while ((getline < fname) > 0) {
array[cnt++] = $0
}
close(fname)

for (i=1 ; i <= cnt ; i++) {
print processIt(array,i,cnt)
}
}

OUTPUT:
Matches for SALES_BACKUP pool=3
Matches for DATABASE_LOG pool=
Matches for FINANCE pool=2
Matches for SCRATCH pool=4
Matches for DEVELOPMENT pool=
Matches for TEST pool=

real 0m0.025s
user 0m0.000s
sys 0m0.010s

2) OTOH there are literally half a hundred examples
of sequential searches through a file using gawk/nawks
defined main() behavior in situations like this if you
search the forum archives. The only time you really need
to use the BEGIN action for a search is when you need
the data searched in a non-sequential manner.
Search for Cakiwi especially on this as his examples
on this topic are numerous and functional.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Report formatting query

ranjit

Technical User

menski

Technical User

marsd

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor