Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

matching tables

Status
Not open for further replies.

krava

Programmer
Jun 4, 2007
48
YU
hello

I have this bash script to match 2 tables using one column.

function awkcrossmatch() {
if [ $# = 3 ]
then
awk -v col="$1" '{v=FNR==1?"\n":$col} NR==FNR {a[v]=$col; b[v]=$0; next} v in a {print b[v],$0}' $2 $3
elif [ $# = 4 ]
then
awk -v col="$1" '{v=FNR==1?"\n":$col} NR==FNR {a[v]=$col; b[v]=$0; next} v in a {print b[v],$0}' $2 $3 > $4
fi

}


I would like now to match 2 tables using 3 columns. For example to match these tables

table1:
col1, col2, col3, col4 ... coln
12,233,44445 ... 2334
12,233,44445 ... 2334
12,233,44445 ... 2334

table2:
col1, col2, col3, col4 ... coln
12,233,44445 ... 2334
12,233,44445 ... 2334
12,233,44445 ... 2334


using col2, col3 and col5. Could someone help me on this?


thanks
p
 
I guess you just need to index the a[] array using a concatenation of the three index columns (i.e. 'v'), ideally using a separator that does not appear in the input data.

Annihilannic.
 
hmmm ... but I am trying to join 2 tables using 2 columns of tables. Not sure I can figure out what you suggest here :(
 
Something like this:

[tt]awk -F, '{v=$2 FS $3 FS $5} NR==FNR { a[v]=$0; next} v in a {print a[v] FS $0}' table1 table2[/tt]

Using this test data:

[tt]$ cat table1
col1, col2, col3, col4 , col5
12,233,44445,1,2334
12,233,44445,2,2335
12,233,44446,3,2334
12,233,44447,9,2334
12,233,44448,10,2334
$ cat table2
col1, col2, col3, col4 , col5
12,233,44447,9,2334
12,233,44449,11,2334
12,233,44445,4,2334
12,233,44445,5,2335
12,233,44446,6,2334[/tt]

Gives this result:

[tt]col1, col2, col3, col4 , col5,col1, col2, col3, col4 , col5
12,233,44447,9,2334,12,233,44447,9,2334
12,233,44445,1,2334,12,233,44445,4,2334
12,233,44445,2,2335,12,233,44445,5,2335
12,233,44446,3,2334,12,233,44446,6,2334[/tt]

Annihilannic.
 
Annihilannic,
I'd use 'SUBSEP' instead of the 'FS' for array indexing.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Thanks, I had a feeling there was something more appropriate, but didn't bother to dig...

Annihilannic.
 
it works!!! thanks.

how to keep headers if column names are not the same. on this example:


column1, column2, column3, column4, column5
12,233,44445,1,2334
12,233,44445,2,2335
12,233,44446,3,2334
12,233,44447,9,2334
12,233,44448,10,2334

col1, col2, col3, col4 , col5
12,233,44447,9,2334
12,233,44449,11,2334
12,233,44445,4,2334
12,233,44445,5,2335
12,233,44446,6,2334

to have :

column1, column2, column3, column4, column5,col1, col2, col3, col4 , col5
12,233,44447,9,2334,12,233,44447,9,2334
12,233,44445,1,2334,12,233,44445,4,2334
12,233,44445,2,2335,12,233,44445,5,2335
12,233,44446,3,2334,12,233,44446,6,2334

 
[tt]awk -F, 'NR==1 { h=$0 ; next } {v=$2 SUBSEP $3 SUBSEP $5} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' table1 table2[/tt]

Annihilannic.
 
I am trying to convolve it with bash and to generalize a bit like

###3 awkcrossmatch - cross-match two DAT tables; usage: awkcrossmatch 3 4 5 if1 if2 of
function awkcrossmatch() {
if [ $# = 4 ]
then
awk -v col1="$1", 'NR==1 { h=$0 ; next } {v=col1} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $2 $3 > $4
elif [ $# = 5 ]
then
awk -v col1="$1" -v col2="$2", 'NR==1 { h=$0 ; next } {v=col1 SUBSEP col2} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $3 $4 > $5
elif [ $# = 6 ]
then
awk -v col1="$1" -v col2="$2" -v col3="$3", 'NR==1 { h=$0 ; next } {v=col1 SUBSEP col2 SUBSEP col3} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $4 $5 > $6
fi

}
 


sorry ... did not finish the sentence. So, trying use this awk command in bash and to generalize in the sense to be able to join table using 1, 2 or 3 columns. so I wrote like:

###3 awkcrossmatch - cross-match two DAT tables; usage: awkcrossmatch 3 4 5 if1 if2 of
function awkcrossmatch() {
if [ $# = 4 ]
then
awk -v col1="$1", 'NR==1 { h=$0 ; next } {v=col1} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $2 $3 > $4
elif [ $# = 5 ]
then
awk -v col1="$1" -v col2="$2", 'NR==1 { h=$0 ; next } {v=col1 SUBSEP col2} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $3 $4 > $5
elif [ $# = 6 ]
then
awk -v col1="$1" -v col2="$2" -v col3="$3", 'NR==1 { h=$0 ; next } {v=col1 SUBSEP col2 SUBSEP col3} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $4 $5 > $6
fi

}

This does not work don't know why. what do you think?
 
I catched one mistake ... did not write #col1, $col2 and $col3 inside awk command. but steel does notwork
 
I'm not sure why you have commas after -v col1="$1", and so-on, that might not be helping.

Also you need to refer to $col1, $col2, etc instead of just col1, col2, because you want the contents of that column, not the column number itself.

Annihilannic.
 
hmmmm ... something else is wrong here. I use this peace of code (no commas, col1, col2 and so on changed to $col1 etc.

###3 awkcrossmatch - cross-match two tables; usage: awkcrossmatch 3 4 5 if1 if2 of
function awkcrossmatch() {
if [ $# = 4 ]
then
awk -v col1="$1" 'NR==1 { h=$0 ; next } {v=$col1} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $2 $3 > $4
elif [ $# = 5 ]
then
awk -v col1="$1" -v col2="$2" 'NR==1 { h=$0 ; next } {v=$col1 SUBSEP $col2} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $3 $4 > $5
elif [ $# = 6 ]
then
awk -v col1="$1" -v col2="$2" -v col3="$3" 'NR==1 { h=$0 ; next } {v=$col1 SUBSEP $col2 SUBSEP $col3} NR==FNR { a[v]=$0; next} FNR==1 { print h FS $0 ; next } v in a {print a[v] FS $0}' $4 $5 > $6
fi

}
table1:

catID,plateIDgar,mjdgar,fiberIDgar
12,266,51630,17
29,266,51630,41
35,266,51630,51
44,266,51630,66
78,266,51630,107
............

table2:

specobjid,plate,mjd,fiber
75094094031880192,266,51630,254
75094093637615616,266,51630,160
75375476025065472,267,51608,613
75375476033454080,267,51608,615
75375475966345216,267,51608,599
...............

with command

awkcrossmatch 3 4 cutted1.csv vivSAV1.csv joined1.csv I get

catID,plateIDgar,mjdgar,fiberIDgar specobjid,plate,mjd,fiber
567460,978,52431,591 75094094031880192,266,51630,254
567460,978,52431,591 75094093637615616,266,51630,160
567460,978,52431,591 75375476025065472,267,51608,613
567460,978,52431,591 75375476033454080,267,51608,615
567460,978,52431,591 75375475966345216,267,51608,599
.........


why tables are not separated by commas in output file? "567460,978,52431,591" is the last row of the first table and for some strange reason the code joines only this row with other table. Any idea what is happening?
 
You didn't specify awk -F, to define the field separator to be a comma. By default awk uses white spaces as a separator.

Annihilannic.
 
yes....it works now. thanks for efort puting on this Annihilannic
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top