Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

awk monks, I am back with question 2

Status
Not open for further replies.

demis001

Programmer
Aug 18, 2008
94
US
I want format the following data

1 177921 N
1 123822 G
1 116756 P
1 131946 m
2 1779 N
2 1238 G
2 1167 P
2 1319 m
3 177921 N
3 123822 G
3 116756 m

I want:
header1 N G P M
1 177921 123822 116756 131946
2 1779 1238 1167 1319
3 177921 123822 116756

Thank you in Advance as always

Dereje


 
Hi

Code:
awk 'BEGIN{print"header1 N G P M"}l!=$1{if(l)print l,a["N"],a["G"],a["P"],a["m"];l=$1;delete a}{a[$3]=$2}' /input/file
Code:
awk 'BEGIN{print"header1 N G P M"}l!=$1{if(l)print l,a["N"],a["G"],a["P"],a["m"];l=$1;for(i in a)delete a[i]}{a[$3]=$2}' /input/file

Feherke.
 
$1 field is 1 million in count and it is not the same as 1 2 3 etc. I put to represent as example. I have tried both scipt and it misses the last row too. It only print 1 and 2 not 3.

Output

header1 N G P M
1 177921 123822 116756 131946
2 1779 1238 1167 1319

It doesn't work if the $1 data line looks like the following

ab
ab
ab
ab
Af
Af
AF
DK
DK
DK
Dk

Thanks

 
Hi

Oops. I forgot it.
Code:
awk 'BEGIN{print"header1 N G P M"}l!=toupper($1){p();l=toupper($1);for(i in a)delete a[i]}{a[$3]=$2}END{p()}function p(){if(l)print l,a["N"],a["G"],a["P"],a["m"]}' /input/file

Feherke.
 
It is a lot closer to what I need but some error in the output. I hope you will refine for me.

Here is the problem, If all 4 value exist for all header, It gives the correct result. But if one is missing, it output the wrong value to the other one.
eg
ak 1615 N
ak 166218 G
ak 226887 m

It gives: the following
header N G P m
ak 1615 166218 226887

The correct output should be
header N G P m
ak 1615 166218 empty 226887

Thanks a lot!!





 
Hi

You want the word "empty" ?
Code:
awk 'BEGIN{print"header1 N G P M";p()}l!=toupper($1){p();l=toupper($1)}{a[$3]=$2}END{p()}function p(){if(l)print l,a["N"],a["G"],a["P"],a["m"];a["N"]=a["G"]=a["P"]=a["m"]="empty"}' /input/file

Feherke.
 
It works much much better

I have few error in my data where the entry looks like this
vi 48095 N
vi 11 N
vi 23605 G

It print for this the following
header N G P m
vi 11 23605 empty empty

It would be better if I got something like this
header N G P m
vi 48095 empty empty empty
vi 11 23605 empty empty


Any way, Thank you Feherke as always. This code is much abstract to me and I don't understand it. But it works with minor error. Infact the problem is with my data



 
Sorry If two value for the same Header, I want to add the number in column $2.

In the above case
header N G P m
vi 11"+"48095 23605 empty empty

If possible to modify the above code

Dereje
 
Hi

You mean to add, like calculate their sum ?
Code:
awk 'BEGIN{print"header1 N G P M";p()}l!=toupper($1){p();l=toupper($1)}{a[$3]+=$2}END{p()}function p(){if(l)print l,a["N"],a["G"],a["P"],a["m"];a["N"]=a["G"]=a["P"]=a["m"]="empty"}' /input/file
Or to concatenate with "+" between the values ?
Code:
awk 'BEGIN{print"header1 N G P M";p()}l!=toupper($1){p();l=toupper($1)}{a[$3]=$2(a[$3]!="empty"?"\"+\""a[$3]:"")}END{p()}function p(){if(l)print l,a["N"],a["G"],a["P"],a["m"];a["N"]=a["G"]=a["P"]=a["m"]="empty"}' /input/file

Feherke.
 
Would you comment these lines please. I want to understand what you did in here
awk 'BEGIN{print"header1 N G P M";p()} #what is p() is it subroutine??
l!=toupper($1){p(); what do u mean when you say 1
l=toupper($1)}{a[$3]+=$2}END{p()}function p() # what you want to achieve with {p()}
{if(l)print l,a["N"],a["GC"],a["P"],a["mem"];
a["N"]=a["GC"]=a["P"]=a["mem"]="empty"}' # what do you mean with this line
$*
 
Code:
awk '
function p() {           #define a function
  if(l)                  #if we have read data then print
    print l,a["N"],a["GC"],a["P"],a["mem"]
  #initialize the array
  a["N"]=a["GC"]=a["P"]=a["mem"]="empty"
}
BEGIN{
  print"header1 N G P M" #print the header line
  p()                    #initialize array
}
l!=toupper($1){          #if new code
  p()                    #  print the data for previous code and initialize array
  l=toupper($1)          #  store the new code
}
{ a[$3]+=$2 }            #collect the data
END{                     #at EOF
  p()                    #  print the data for last code
}
'

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
One more thing,

In awk array is intialized both as a[] and a()?
what do you mean when you say
a["N"]=a["GC"]=a["P"]=a["mem"]="empty"

Is that mean you assign a["GC"] to a["N"]??, I don't understood a chain of assignment operator used.
a["N"];
a["GC"];
 
Where did you see a() ????

In awk the assignment is right to left, so you put "empty" to a["mem"] and then to a["P"] and then ...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
p isn't an array but a function.
Anyway; man awk

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top