Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Add values in file - resulting in uniqueness 2

Status
Not open for further replies.

dickiebird

Programmer
Feb 14, 2002
758
GB
This is not going to plan - guess my brain died!
I have a sorted flat file, a sample of which :-
JOBURG,10072003,FG8,19,0,500,1,32000
JOBURG,10072003,FG9,10,0,50,1,2000
JOBURG,10072003,FG9,10,0,100,1,4000
JOBURG,10072003,FG9,11,0,100,1,6510
JOBURG,10072003,FG9,11,0,100,5,100
JOBURG,10072003,FG9,11,0,100,5,200
JOBURG,10072003,FG9,11,0,20,1,3180
JOBURG,10072003,FG9,11,0,20,5,100
JOBURG,10072003,FG9,11,0,20,5,50
JOBURG,10072003,FG9,11,0,20,5,50
JOBURG,10072003,FG9,11,0,200,1,7890
JOBURG,10072003,FG9,11,0,200,5,100
JOBURG,10072003,FG9,11,0,200,5,150
JOBURG,10072003,FG9,11,0,200,5,400
JOBURG,10072003,FG9,11,0,200,5,50
JOBURG,10072003,FG9,11,0,50,1,5230
JOBURG,10072003,FG9,11,0,50,5,100
JOBURG,10072003,FG9,11,0,50,5,50

I want to combine(ie sum values) all lines where cols 1-6 are the same
(cols 1-2 are always JOBURG,10072003 throughout the file)
Col 7 holds types 1,3,5 or 7. Type 1 is always greatest value.
Type 3 is added on, type 5 is subtracted, type 7 is dependant on the leading sign ( + or - )
So I'd have a result of

JOBURG,10072003,FG8,19,0,500,1,32000 -no change
JOBURG,10072003,FG9,10,0,50,1,2000 -no change
JOBURG,10072003,FG9,10,0,100,1,4000 -no change
JOBURG,10072003,FG9,11,0,100,1,6210 -6510 less 100 less 200
JOBURG,10072003,FG9,11,0,20,1,2980 - 3180 less100 less 50 less 50
JOBURG,10072003,FG9,11,0,200,1,7190 -7890 less100 less 150 less 400 less 50
JOBURG,10072003,FG9,11,0,50,1,5080 -5230 less 100 less 50


This is what I was trying - but I'm getting bogged down
I'm sure there's an easy way - isn't there ?????

#!/bin/ksh
infile=$1

outfile1=${1}.1

awk -F, ' BEGIN {r1=" "}
{
if(s1=="JOBURG") /*first time round s1 is blank*/
{
if($3==s3 && $4==s4 && $5==s5 && $6==s6) /* are critical cols the same ??*/
{
if($7=="1" || $7=="3")
s8=s8+$8
if($7=="7" && substr($8,1,1)=="+") /* Do adjustments */
s8=s8+substr($8,2,9)
if($7=="5")
s8=s8-$8
if($7=="7" && substr($8,1,1)=="-")
s8=s8-substr($8,2,9)
r1=s1 /*Save to another set of variables*/
r2=s2
r3=s3
r4=s4
r5=s5
r6=s6
r7=s7
r8=s8
}
else
print s1","s2","s3","s4","s5","s6","s7","s8 /* Otherwise print */
}
s1=$1 /Save parsed columns */
s2=$2
s3=$3
s4=$4
s5=$5
s6=$6
s7=$7
s8=$8

if(r1!=" ") /* There's a change - print saved stuff */
{
if($3!=r3 || $4!=r4 || $5!=r5 || $6!=r6)
{
print r1","r2","r3","r4","r5","r6","r7","r8
r1=" "
}
}

}
' $infile > $outfile1

Any contributions gratefully received.
TIA

Dickie Bird (:)-)))
 
If the order doesn't matter.... something like that:

nawk -f dickie.awk dickie.txt

#-------------- dickie.awk
BEGIN {
FS=OFS=","
COLS_diff="6"
}

{
for (i=1; i <= COLS_diff; i++)
idx= (i == 1) ? $i : (idx OFS $i);

if ( $(COLS_diff + 1) == &quot;1&quot;)
arr[idx] = $(COLS_diff + 2);

if ( $(COLS_diff + 1) == &quot;3&quot; || $(COLS_diff + 1) == &quot;7&quot;)
arr[idx] += $(COLS_diff + 2);

if ( $(COLS_diff + 1) == &quot;5&quot;)
arr[idx] -= $(COLS_diff + 2);

}

END {
for ( i in arr)
printf(&quot;%s%s%d%s%s\n&quot;, i, OFS, 1, OFS,arr);

}


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
actually it can be simplified as:


#------------------- dickie.awk
BEGIN {
FS=OFS=&quot;,&quot;
COLS_diff=&quot;6&quot;
}

{
for (i=1; i <= COLS_diff; i++)
idx= (i == 1) ? $i : (idx OFS $i);

if ( $(COLS_diff + 1) == &quot;1&quot; || $(COLS_diff + 1) == &quot;3&quot; || $(COLS_diff + 1) == &quot;7&quot;)
arr[idx] += $(COLS_diff + 2);

if ( $(COLS_diff + 1) == &quot;5&quot;)
arr[idx] -= $(COLS_diff + 2);

}

END {
for ( i in arr)
printf(&quot;%s%s%d%s%s\n&quot;, i, OFS, 1, OFS,arr);

}


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Or maybe this

BEGIN {
OFS = FS = &quot;,&quot;
}
{
match($0,/[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,[^,]*,/)
a = substr($0,1,RLENGTH)
if (a != hlda) {
if (NR > 1) {
c = 0
for (j=1;j<=n;j++) c += b[j]
print hlda &quot;1,&quot; c
n = 0
}
hlda = a
}
b[++n] = $8
if ($7 == 5) b[n] = -b[n]
}
END {
c = 0
for (j=1;j<=n;j++) c += b[j]
print hlda &quot;1,&quot; c
}

CaKiwi

&quot;I love mankind, it's people I can't stand&quot; - Linus Van Pelt
 
Both Brilliant !!!!!!
Vlad's use of associative arrays still bewilders me - I'll crack it one day.
Cakiwi's answer doesn't require another sort afterwards.
Two (actually 3) great answers - One soothed brain !
Thanks Guys

Dickie Bird (:)-)))
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top