Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

adding last line of multiple data files with a comma in it 2

Status
Not open for further replies.

octar

Technical User
Oct 21, 2002
28
AU
Hi,

I have multiple files of the following format

FILE1:
Date Size
20041201 15,000
20041202 15,120

FILE2:
Date Size
20041201 1,195,040
20041202 2,334,140

ETC...

What I need to do is merge them into one file adding each respective line as it goes..

eg.

MASTERFILE:
Date Size
20041201 1,210,040
20041202 2,349,260


It would be ok if the comma's were not in the files, is there a way to remove the comma's and add the result?

thanks
 
man awk

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 

nawk -f octar.awk FILE* | sort -n

here's octar.awk file:

Code:
NR==1 { header=$0}
FNR > 1 {

  gsub(",","", $2)
  arr[$1]+=$2
}
END {
   print header;
   for (i in arr)
     print i, arr[i];
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
If the files aren't too large, the following will work. If they are too large, the program will run out of RAM and another way will have to be used.

Code:
# Save and skip header.
/^Date/ { header = $0; startline = FNR + 1; next }

2==NF { date[FNR] = $1
  gsub( /,/, "" )
  sum[FNR] += $2
}

END {
  if ( header ) print header
  for (i=startline; i <= FNR; i++)
    if ( i in date )
      print date[i], sum[i]
}

Save as "addmulti.awk". If all filenames begin with the same string, for example "data",
run with
[tt]awk -f addmulti.awk data* >outfile
[/tt]
Otherwise run with
[tt]awk -f addmulti.awk fname1 fname2 fname3 ... >outfile
[/tt]

Let me know whether or not this helps.

If you have nawk, use it instead of awk because on some systems awk is very old and lacks many useful features. For an introduction to Awk, see faq271-5564.

 
While I was composing my solution, you two fellows posted yours. Why don't you get a life? [smile] (Or maybe I should.)
 
THanks for your quick response... We are almost there, here is the output from the last few dates I have using the awk script...

20041220 106698573
20041221 106953288

The problem is when I do the following:

for i in $(ls *.data);do tail -2 $i;done | sort -n

and import into excel I get

20041220 155692116
20041221 156084270

I know each file is of different lengths, would this have an impact?

eg.

file1:

20021201
...
...
20041221

file2:

20031101
...
...
20041221

file3:

20041201
...
...
20041221


Do I have to do a grep on each date perhaps??
 
NOTE:

I have discovered that three lines in the data files are missing the end numbers.

eg.

27,234,234,89
01,123,435,32

instead of

27,234,234,890
01,123,435,320

Is it possible to check through the files and add the extra number if we find two number to the right of the comma??

ps. this is getting more complicated but it is the last part I need for it to work.
 
gsub(/,..$/,"&0")

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Brilliant...

thankyou all for your help
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top