Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

2d arrays help

Status
Not open for further replies.

userand

Technical User
Apr 12, 2011
28
CH
Hi all,

I have a problem, and cannot see how to solve it, so I need some help.

I have a data file like this:

1 4 8 1 3 9
1 1 9 2 1 24
2 4 14 2 2 54
2 3 4 2 4 39
2 1 2 2 1 103
4 2 4 3 2 78
4 4 9 3 1 1

I would like to make a 2d histogram from this data in the following way:

- the first and 4th columns are just row numbers which have been generated by another program, so if $1==$4 we should handle the remaining $2, $3 and $5,$6 columns together.
Imagine a chess table for example, so the 2nd column is, let's say, the number side (from 1 to 8), and the 5th column is the alphabetical side of the table (from a to h, in this sense just numbers from 1 to 8, like column 2).
The 3rd and 6th columns are some kind of values.

In this way I just would like to make a 2 dimensional histogram, regarding the example above,

1 1 9
1 2 0
1 3 9
1 4 8

2 1 129 -<--- (24+2+103)
2 2 54
2 3 4
2 4 53 -<--- (39+14)

3 1 1
3 2 78
3 3 0
3 4 0

4 1 0
4 2 4
4 3 0
4 4 9

Any idea?
 
Simply create a 2-dimensional array and add up the values in it, e.g. a[$1,$2]+=$3 and a[$4,$5]+=$6, then at the end loop through your array indices and print out the results.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
This part is not so trivial for me:

awk '{a[$1,$2]+=$3;a[$4,$5]+=$6;}END{for(i=1;i<=NR;i++)for(i=1;j<=4;j++)print i,j,a[i,j];}'

So far this is not working properly. Importent point is that the numbers in the 1st and 4th columns goes from 1 to arbitrary value, let's sat from 1 to 10000; only the 2nd and 5th columns range are fixed to (1,..,4) interval.

Can you help me a little more?
 
Well, I assume NR is not a reliable value to use as the maximum of that range; so simply define a "max" variable and keep track of the largest number seen?

Are the values in the 1st and 4th columns a contiguous sequence, or are there some gaps? i.e. might it contain 1, 2 ,5, 72? Assuming they are not, this should work:

Code:
awk '
    [blue]$1[/blue]>max { max=[blue]$1[/blue] }
    [blue]$4[/blue]>max { max=[blue]$4[/blue] }
    { a[[blue]$1[/blue],[blue]$2[/blue]]+=[blue]$3[/blue]; a[[blue]$4[/blue],[blue]$5[/blue]]+=[blue]$6[/blue]; }
    [green]END[/green] {
        [olive]for[/olive](i=1;i<=max;i++)
            [olive]for[/olive](j=1;j<=4;j++)
                [b]print[/b] i,j,a[i,j];
    }
'

Note that you had a typo in your code, for ([red]i[/red]=1;j<=4 ....

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Hi,

I did a mistake in my explanation. Correction:

I have these data file:

1 4 8 1 3 9
1 1 9 2 1 24
2 4 14 2 2 54
2 3 4 2 4 39
2 1 2 2 1 103
4 2 4 3 2 78
4 4 9 3 1 1
5 1 3 6 4 2

And I should get this

1 1 105
1 2 56
1 3 18
1 4 46

2 1 0
2 2 0
2 3 0
2 4 0

3 1 28
3 2 58
3 3 0
3 4 43

4 1 38
4 2 68
4 3 17
4 4 53


So what matters is that if $1==$4 then make 2d histo from $2 and $5 if their values are in $3 and $6 columns, and then you should sum up, as indicated above.

So, can you give me some more hint to solve this?
 
Sorry, just a remark: $1==$4 in the sense, that their values are the same; they should not necessery be in the same row, as you might see for example in the beginning

1 4 8 1 3 9 <----- here $1 == $4
1 1 9 2 1 24 <----- here $1 != $4

but we have have to take into account that there are 3 standalone rows, in which there the first columns are the same...



 
I can't make sense of your requirements.

I can understand how you combine "2 1 2" and "2 1 103" to get "1 1 105", but shouldn't you also include "2 1 24" to get "2 1 129"?

And shouldn't "1 4 46" be "1 4 41" (a combination of "2 1 2" and "2 4 39"?

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
No. You just take into account the $2 and $5 columns as indecies of the 2d array, let's say a[$2,$5]; and you should add $3 and $6 together +=$3 + +=$6 somehow ....
 
Ok, well, let's start again.

I have this data

# 1st col 2nd col 3rd col 4th col 5th col 6th col

1 4 8 1 3 9
1 1 9 2 1 24
2 4 14 2 2 54
2 3 4 2 4 39
2 1 2 2 1 103
4 2 4 3 2 78
4 4 9 3 1 1
5 1 3 6 4 2


- The 1st and the 4th cols numbering go from 1 to let's say 1000, and growing monotonically. It is allowed to contain repeated numbers one after the other, as you can see in the example data.

- let be the 2nd column x, and the 5th column y. Then the 3rd and the last, 6th columns are the values according to x, and y.

- If the values are identical in the 1st and the 4th colums, the do the calculation (They are not necessary to be in identical rows, see the first two rows of the data)

- 2. and 5. cols are go from 1 to 4, so e.g. if col 1 and col 4 values are the same, i.e.

1 4 8 1 3 9
1 1 9 2 1 24

the "1" in col 1 and col 4 occurs at least once, then we have to calculate. So we should see the 2nd and 5th columns and calculate the pairs, so we get

4 3 8+9
1 3 9+9

as result in this case.

Go one step furhter, let us see the first 5 rows

1 4 8 1 3 9
1 1 9 2 1 24
2 4 14 2 2 54
2 3 4 2 4 39
2 1 2 2 1 103

Now, we should see these rows only

1 1 9 2 1 24
2 4 14 2 2 54
2 3 4 2 4 39
2 1 2 2 1 103

In this case the result is:

4 1 14+24+103
4 2 54
4 4 39+14
3 1 4+24+103
3 2 4+54
3 4 4+39
1 1 2+24+103
1 2 2+54
1 4 2+39

but using the existing previous result for col 1 and col 4 equal "1", we have so far this result

4 1 14+24+103
4 2 54
4 3 8+9
4 4 39+14
3 1 4+24+103
3 2 4+54
3 4 4+39
1 1 2+24+103
1 2 2+54
1 3 9+9
1 4 2+39

And so on...

In this way we should fill this table for x-y

1 1 ?
1 2 ?
1 3 ?
1 4 ?

2 1 ?
2 2 ?
2 3 ?
2 4 ?

3 1 ?
3 2 ?
3 3 ?
3 4 ?

4 1 ?
4 2 ?
4 3 ?
4 4 ?
 
Sorry, I haven't been back to the site for a while. Are you still working on this or have you made some progress yourself.

Also, do you need the solution in awk or is perl okay (it has much better multi-dimensional array capabilities).

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Try this perl solution:

Perl:
[COLOR=#006600]#!/usr/bin/perl -w[/color]
[COLOR=#0000FF]use[/color] strict;

[COLOR=#0000FF]my[/color] (@a,%first,%second,%result);

[COLOR=#006600]# read data into 2 hashes for the first and second groups of columns[/color]
[COLOR=#0000FF]while[/color] (<>) {
        @a=[COLOR=#FF0000]split[/color]();
        $first{$a[[COLOR=#FF0000]0[/color]]}{$a[[COLOR=#FF0000]1[/color]]}.=$a[[COLOR=#FF0000]2[/color]] . [COLOR=#808080]"+"[/color];
        $second{$a[[COLOR=#FF0000]3[/color]]}{$a[[COLOR=#FF0000]4[/color]]}.=$a[[COLOR=#FF0000]5[/color]] . [COLOR=#808080]"+"[/color];
}

[COLOR=#006600]# for each key in the first group[/color]
[COLOR=#0000FF]foreach[/color] [COLOR=#0000FF]my[/color] $key ([COLOR=#FF0000]sort[/color] [COLOR=#FF0000]keys[/color] %{{%first,%second}}) {
        [COLOR=#006600]# if it exists in the second group[/color]
        [COLOR=#0000FF]if[/color] ([COLOR=#FF0000]exists[/color] $first{$key} && [COLOR=#FF0000]exists[/color] $second{$key}) {
                [COLOR=#006600]# for every combination of x and y[/color]
                [COLOR=#0000FF]foreach[/color] [COLOR=#0000FF]my[/color] $[COLOR=#FF8000]x[/color] ([COLOR=#FF0000]keys[/color] %{$first{$key}}) {
                        [COLOR=#0000FF]foreach[/color] [COLOR=#0000FF]my[/color] $[COLOR=#FF8000]y[/color] ([COLOR=#FF0000]keys[/color] %{$second{$key}}) {
                                [COLOR=#006600]# append the data value to the list[/color]
                                $result{$[COLOR=#FF8000]x[/color]}{$[COLOR=#FF8000]y[/color]}.=[COLOR=#808080]"$first{$key}{$x}$second{$key}{$y}"[/color];
                        }
                }
        }
}

[COLOR=#006600]# print out all x and y combinations, the data, and the sum[/color]
[COLOR=#0000FF]foreach[/color] [COLOR=#0000FF]my[/color] $[COLOR=#FF8000]x[/color] ([COLOR=#FF0000]sort[/color] [COLOR=#FF0000]keys[/color] %result) {
        [COLOR=#0000FF]foreach[/color] [COLOR=#0000FF]my[/color] $[COLOR=#FF8000]y[/color] ([COLOR=#FF0000]sort[/color] [COLOR=#FF0000]keys[/color] %{$result{$[COLOR=#FF8000]x[/color]}}) {
                [COLOR=#006600]# strip off trailing +[/color]
                $result{$[COLOR=#FF8000]x[/color]}{$[COLOR=#FF8000]y[/color]} =~ [COLOR=#FF8000]s[/color]/\+$//;
                [COLOR=#006600]# separate result list into numbers and add them up[/color]
                [COLOR=#0000FF]my[/color] $tot=[COLOR=#FF0000]0[/color];
                [COLOR=#0000FF]foreach[/color] ([COLOR=#FF0000]split[/color](/\+/,$result{$[COLOR=#FF8000]x[/color]}{$[COLOR=#FF8000]y[/color]})) { $tot+=$_ };
                [COLOR=#FF0000]print[/color] [COLOR=#808080]"$x $y $result{$x}{$y} $tot\n"[/color];
        }
        [COLOR=#FF0000]print[/color] [COLOR=#808080]"\n"[/color];
}

The output I get for your input data is:

Code:
1 1 2+24+103 129
1 2 2+54 56
1 3 9+9 18
1 4 2+39 41

3 1 4+24+103 131
3 2 4+54 58
3 4 4+39 43

4 1 14+24+103 141
4 2 14+54 68
4 3 8+9 17
4 4 14+39 53

Naturally you may need to modify it to adjust the output format to your requirements.

Annihilannic
[small]tgmlify - code syntax highlighting for your tek-tips posts[/small]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top