Help with data lookup table 1

learningawk · Oct 15, 2002

Hi,
I'm new to awk/programming and is there an easy way to compare 2 files, one being a group of values that another file would look for similiar values and then grab 2 values from that file and store in file 2 for further processing?

My first data look up file is in the following format:

XXXX YYYY AA BB 1 1
XXXX YYYY AA BB 1 2
XXXX YYYY AA BB 1 3
XXXX YYYY AA BB 1 4
XXXX YYYY AA BB 1 5
XXXX YYYY DD CC 5 1
XXXX YYYY DD CC 5 2
XXXX YYYY DD CC 5 3
XXXX YYYY DD CC 5 4
XXXX YYYY DD CC 5 5
XXXX YYYY EE FF 4 1
XXXX YYYY EE FF 4 2
XXXX YYYY EE FF 4 3
XXXX YYYY EE FF 4 4
XXXX YYYY EE FF 4 5
XXXX YYYY EE FF 4 6

It consists of 3 groups of data for a specific entity.

My 2nd file looks like:

zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),AA,BB,1,VAR1,VAR2,VAR3,VAR4
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),DD,CC,5,VAR1,VAR2,VAR3,VAR4
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),EE,FF,4,VAR1,VAR2,VAR3,VAR4

I would like to compare file 2 to file 1 and once you find a match in the file2 fields AA,BB,1 with similiar values in file 1 then retreive from file1 the xxxx and yyyy and store in file2. These values will then be used for further processing.

I would also like to check if during the compare process if the lookup table has more than 5 points per group, (such as in the last records in file 1)it would return some sort of alert that that is an irregular match.

Thank you for helping on my problem.

dickiebird · Oct 17, 2002

The following threads should get you pointing in the right direction:
thread271-170922
thread271-163206

HTH ;-) Dickie Bird

Honi soit qui mal y pense

marsd · Oct 17, 2002

if you post what your expected output would look like given representative data from
both files I might be able to help, but the
examples as is would make me work too hard.

learningawk · Oct 17, 2002

If record in data file 1 that is loaded into an array looks like:
XXXX YYYY AA BB 1 1

and record in data file 2 that is being compared to array data looks like:
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),AA,BB,1,1,VAR1,VAR2,VAR3,VAR4 ( the fields that contain "AA,BB,1,1" are in static fields $10,$11,$12,$13)

The comparsion from both data sets would be a match if
AA BB 1 AND 1 are complete match. It the match is identified, then the script would retreive the values XXXX AND YYYY in store in file 2 record.

End result would look like:
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),AA,BB,1,VAR1,VAR2,VAR3,VAR4,XXXX,YYYY

Thank You for checking this out.

vgersh99 · Oct 17, 2002

something like that should get you started - not 'alerts' yet:

nawk -f a.awk file2

#--------- file1
XXXX YYYY AA BB 1 1
XXXX YYYY AA BB 1 2
XXXX YYYY AA BB 1 3
XXXX YYYY AA BB 1 4
XXXX YYYY AA BB 1 5
XXXX YYYY DD CC 5 1
XXXX YYYY DD CC 5 2
XXXX YYYY DD CC 5 3
XXXX YYYY DD CC 5 4
XXXX YYYY DD CC 5 5
XXXX YYYY EE FF 4 1
XXXX YYYY EE FF 4 2
XXXX YYYY EE FF 4 3
XXXX YYYY EE FF 4 4
XXXX YYYY EE FF 4 5
XXXX YYYY EE FF 4 6

#--------- file2
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),AA,BB,1,1,VAR1,VAR2,VAR3,VAR4
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),DD,CC,5,2,VAR1,VAR2,VAR3,VAR4
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),EE,FF,4,6,VAR1,VAR2,VAR3,VAR4

#---------- a.awk
BEGIN {
FS1=" "
FS2=","

file1="file1"

FS=FS1
while (getline< file1 > 0) {
idx=$3$4$5$6
file1ARR[idx]=$1 FS2 $2;
# printf("DEBUG: file1ARR[%s]->[%s]\n", idx, file1ARR[idx]);
}
FS=FS2

}

{
idx=$(NF-7) $(NF-6) $(NF-5) $(NF-4)
# printf(" DEBUG: file1ARR[%s]->[%s]\n", idx, file1ARR[idx]);

if (idx in file1ARR)
print $0 FS2 file1ARR[idx];
}

vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+

marsd · Oct 17, 2002

This is tentative, lets try to work through the bugs.

while ((getline < fname) > 0) {
array[x++] = $0
}
close(fname)
}
{
FS = ","
for (i=1 ; i <= x ; i++) {
if (i == NR) {
mlist = $10" "$11" "$12" "$13
#print mlist
split(array,tmp," &quot
alist = tmp[3]" "tmp[4]" "tmp[5]" "tmp[6]
if (mlist == alist) {
$0 = $0 tmp[1]" "tmp[2]
}
}
}

A tentative test reveals that this approach works.
awk ' {
t = "ORA123: error"
for (i=1 ; i <= 3 ; i++) {
if (NR == i) {
mlist = $1" "$2
if (mlist == t) {
$0 = $0","t
}
}
} ; print $0
}' scrap.txt

Sat Sep 28
ORA123: error in xxjob,ORA123: error
ORA124: 12345 in xxjob
Sun Sep 29
ORA123: error in xxjob
ORA124: 12346 in xxjob

marsd · Oct 17, 2002

jeez vlad, beat me to the punch..

learningawk · Oct 17, 2002

Thank you for all the assistance on this problem.
Unfortunately I can't get either of the above scripts to work. I am using pc based cygwin to run GAWK.

On Vlad's script I get parse errors.
I am executing it as gawk -f vlad.gawk file2

and on Marsd's script I also get parse errors.
Running it as gawk -f marsd.gawk file2

Sorry but I am not familiar with awk scripting to tell what the parse error is telling me.

I have updated the data files for better checking.
file1:
1111 2222 AA BB 1 1
2222 3333 AA BB 1 2
4444 5555 AA BB 1 3
6666 7777 AA BB 1 4
8888 9999 AA BB 1 5
1010 1111 DD CC 1 1
1212 1313 DD CC 5 2
1414 1515 DD CC 5 3
1616 1717 DD CC 5 4
1818 1919 DD CC 5 5
2020 2121 EE FF 4 1
2222 2323 EE FF 4 2
2424 2525 EE FF 4 3
2626 2727 EE FF 4 4
2828 2929 EE FF 4 5
3030 3131 EE FF 4 6

file2:
1,2,3,4,5,6,7,8,9,AA,BB,1,1
1,2,3,4,5,6,7,8,9,DD,CC,5,2
1,2,3,4,5,6,7,8,9,EE,FF,4,6

hopefull output would be:
1,2,3,4,5,6,7,8,9,AA,BB,1,1,1111,2222
1,2,3,4,5,6,7,8,9,DD,CC,5,2,1212,1313
1,2,3,4,5,6,7,8,9,EE,FF,4,6,3030,3131

thanks

marsd · Oct 17, 2002

I received no errors on my home machine when
running the program. It may be that I don't
understand the data format well enough.
If you could post the errors you receive that would be
a help.

learningawk · Oct 17, 2002

Thanks, I thought maybe the error was because of no BEGIN { statement, but adding it didn't help either.

this is the script I'm running:

while ((getline < file1) > 0) {
array[x++] = $0
}
close(file1)
}
{
FS = ","
for (i=1 ; i <= x ; i++) {
if (i == NR) {
mlist = $10" "$11" "$12" "$13
#print mlist
split(array,tmp," &quot
alist = tmp[3]" "tmp[4]" "tmp[5]" "tmp[6]
if (mlist == alist) {
$0 = $0 tmp[1]" "tmp[2]
}
}
}

gawk -f marsd.awk file2

Here's the error:
gawk: marsd.awk:1: while ((getline < file1) > o) {
gawk: marsd.awk:1: ^ parse error
gawk: marsd.awk:5: }
gawk: marsd.awk:5: ^ parse error

marsd · Oct 17, 2002

BEGIN {
while ((getline < file1) > 0) {
array[x++] = $0
}
close(file1)
}
}

{
FS = ","
for (i=1 ; i <= x ; i++) {
if (i == NR) {
mlist = $10" "$11" "$12" "$13
#print mlist
split(array,tmp," &quot

alist = tmp[3]" "tmp[4]" "tmp[5]" "tmp[6]
if (mlist == alist) {
$0 = $0 tmp[1]" "tmp[2]
}
}
}
}

Try that, you may also have to use full
paths for file1. I am not familiar with cygwin but I have used gawk on a win98 machine for many similar things and had no trouble that I remember.

marsd · Oct 17, 2002

oops...
One too many brackets in the begin section.

vgersh99 · Oct 18, 2002

hmmmmmmmm.........
I get NO errors running it under Cygwin's gawk.
What are you seeing and how are you running it?
vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+

learningawk · Oct 18, 2002

I use NAWK on the job and now I am getting the following error:
nawk -f marsd.awk file2
nawk: null file name in print or getline
source line number 2

I added the directory path for file name but it still doesn't run.

Here is the marsd.awk I am using:

BEGIN {
while ((getline < file1) > 0) {
array[x++] = $0
}
close(file1)
}
{
FS = ","
for (i=1 ; i <= x ; i++) {
if (i == NR) {
mlist = $10" "$11" "$12" "$13
#print mlist
split(array,tmp," &quot

alist = tmp[3]" "tmp[4]" "tmp[5]" "tmp[6]
if (mlist == alist) {
$0 = $0 tmp[1]" "tmp[2]
}
}
}
}

marsd · Oct 18, 2002

Try it with vlads syntax I use gawk, so this is the second
time in two days I have run into a problem with nawk that doesn't exist with gawk, if this is syntax...

Use the full pathname:
while (getline < "/pathto/file1" > 0) {
Maybe that will work, otherwise try vlads, he's knows
nawk.

vgersh99 · Oct 18, 2002

heh - don't forget the quotes OR use the '-v file1=pathname' on the command line

while ((getline < "file1&quot

> 0) { vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+

marsd · Oct 18, 2002

This is from your post way up top vlad..
I thought you knew something about nawk and
getline I didn't...
<snip>
BEGIN {
FS1=" "
FS2=","

file1="file1"

FS=FS1
while (getline< file1 > 0) {
idx=$3$4$5$6
</snip>

vgersh99 · Oct 18, 2002

correct, but I had:

file1="file1"
while (getline< file1 > 0) {
etc....

whereas your version didn't

vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+

marsd · Oct 18, 2002

Ah, That's it..
Knew I missed something.

Have a nice weekend.

learningawk · Oct 18, 2002

vlad,

Would you be so kind to repeat here on the forum the same script that was working for you on GAWK and the exact command line so I can check it against my GAWK.

Thanks

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Help with data lookup table 1

Technical User

Programmer

IS-IT--Management

Technical User

Programmer

IS-IT--Management

IS-IT--Management

Technical User

IS-IT--Management

Technical User

IS-IT--Management

IS-IT--Management

Programmer

Technical User

IS-IT--Management

Programmer

IS-IT--Management

Programmer

IS-IT--Management

Technical User

Similar threads

Log in

Part and Inventory Search

Sponsor