Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Average column 2

Status
Not open for further replies.

fcolassie

Technical User
Dec 12, 2003
12
NL
Hi awk users,

Is there a simple awk script which allows me to average only parts of a column. Suppose I have 100 time points and I want for example to average the first ten time points and then the following twenty time points and then for example the subsequent 5 time points. In short, an awk script which allows me to be as flexible as possible in averaging parts of just one columns.

Any help will be appreciated.







 
I have something like this:
{if ($1 <= 20) (av +=$2/NR)
if (($1 >= 21) && ($1 <= 50)) (av1 += &2/NR)}

 
Not tested,

{if ($1 <= 20) {av +=$2/NR}
if ($1 >= 21 && $1 <= 50) {av1 += &2/NR}}


tikual
 
Hi fcolassie,

Try and adapt the scritp Avg.awk (source at bottom).

Input file
[tt]
1 12
2 22
3 32
4 42
5 10
7 10
8 10
9 20
10 30
11 10
12 5
13 5
14 12
15 5
[/tt]

Example of execution
Compute column 2 average for lines where column 1 is is less or equal 5, between 6 and 10 and greater or equal 20.
The list of lines specified by : -5,6-10,20-
[tt]
/home/jp> Avg.awk -- -5,6-10,20- points.dat
Slice: - 5 Average: 23,600000
Slice: 6 - 10 Average: 17,500000
Slice: 20 - Average: No data
/home/jp>
[tt]

Awk program
[tt][ignore]
gssjgu:/g/g00k00/gssjgu/TMP> cat Avg.awk
#!/usr/bin/awk -f
#
# File: Avg.awk
# Usage: Avg.awk -- list [file ...]
# List = list of slices values ($1) to average delimited by &quot;,&quot;
# slice = x or x-y or x- or y-
# Input: $1 = selective value
# $2 = value to average
# Example: Average values less or equal than 10, and from 20 to 30
# Avg.awk -10,20-30 input_file
#

BEGIN {
if (ARGC == 1) exit 1;
list = ARGV[1];
ARGV[1] = &quot;&quot;;
if (ARGC == 2) ARGV[ARGC++] = &quot;-&quot;;


if (list == &quot;&quot;) list = &quot;-&quot;;
n=split(list, sall, &quot;,&quot;);
for (nbs=1; nbs<=n; nbs++) {
pos = match(sall[nbs], &quot;-&quot;);
if (pos == 0) {
Slices[nbs,&quot;start&quot;] = sall[nbs];
Slices[nbs,&quot;end&quot; ] = sall[nbs];
} else {
val = substr(sall[nbs],1,pos-1);
Slices[nbs,&quot;start&quot;] = val;
val = substr(sall[nbs],pos+1);
Slices[nbs,&quot;end&quot; ] = val;
}
}
Slices[0] = nbs-1;

}

function SliceIndex(val, is) {
for (is=1; is<=Slices[0]; is++) {
if (Slices[is,&quot;start&quot;] != &quot;&quot; && val < Slices[is,&quot;start&quot;]) continue;
if (Slices[is,&quot;end&quot; ] != &quot;&quot; && val > Slices[is,&quot;end&quot; ]) continue;
return is;
}
return 0;
}


{
s = SliceIndex($1);
if (s != 0) {
Slices[s,&quot;cnt&quot;]++;
Slices[s,&quot;sum&quot;] += $2;
}
}

END {
for (s=1; s<=Slices[0]; s++) {
if (Slices[s, &quot;cnt&quot;] != 0) {
avg = Slices[s,&quot;sum&quot;] / Slices[s, &quot;cnt&quot;];
printf &quot;Slice: %5s - %5s Average: %f\n&quot;, Slices[s,&quot;start&quot;], Slices[s,&quot;end&quot;], avg;
} else {
printf &quot;Slice: %5s - %5s Average: No data\n&quot;, Slices[s,&quot;start&quot;], Slices[s,&quot;end&quot;];
}
}
}
[/ignore][/tt]

Jean Pierre.
 
I thought the question was something else. This script just averages the first ten rows, then the following twenty rows and then the subsequent 5 rows.

BEGIN{
split(&quot;10 20 5&quot;,arr)
idx=1
}
{
sum+=$1
cnt++
if (cnt==arr[idx]) {
print &quot;average of &quot; cnt &quot; rows is &quot; sum / cnt
sum=0
cnt=0
idx++
}
}

Tested...

average of 10 rows is 5.5
average of 20 rows is 20.5
average of 5 rows is 33
 
Dear Ygor,


Thank yuo very much for your script. Especially the split(&quot;10 20 5&quot;,arr) option was very useful and gave me flexibility. However, is there an option to print out the average of the first column (time series) next to the average of the second column (dependent variable). For example:

average of 10 rows (column 1) is 5 and average of 10 rows
(column 2) is 5.5. So there will be two columns. For example:
Average column 1 average column 2
5 5.5
10 20.5

Second question. Can you recommend me a book in which I can quiclky learn all these commands.

Best regards

 
From Ygor's code:
Code:
BEGIN{
  split(&quot;10 20 5&quot;,arr)
  idx=1
}
{
  s1+=$1;s2+=$2
  cnt++
  if (cnt==arr[idx]) {
      print s1/cnt,s2/cnt
      s1=s2=0
      cnt=0
      idx++
  }
}

Hope This Help
PH.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top