Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Help with data lookup table 1

Status
Not open for further replies.

learningawk

Technical User
Oct 15, 2002
36
US
Hi,
I'm new to awk/programming and is there an easy way to compare 2 files, one being a group of values that another file would look for similiar values and then grab 2 values from that file and store in file 2 for further processing?

My first data look up file is in the following format:

XXXX YYYY AA BB 1 1
XXXX YYYY AA BB 1 2
XXXX YYYY AA BB 1 3
XXXX YYYY AA BB 1 4
XXXX YYYY AA BB 1 5
XXXX YYYY DD CC 5 1
XXXX YYYY DD CC 5 2
XXXX YYYY DD CC 5 3
XXXX YYYY DD CC 5 4
XXXX YYYY DD CC 5 5
XXXX YYYY EE FF 4 1
XXXX YYYY EE FF 4 2
XXXX YYYY EE FF 4 3
XXXX YYYY EE FF 4 4
XXXX YYYY EE FF 4 5
XXXX YYYY EE FF 4 6

It consists of 3 groups of data for a specific entity.

My 2nd file looks like:

zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),AA,BB,1,VAR1,VAR2,VAR3,VAR4
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),DD,CC,5,VAR1,VAR2,VAR3,VAR4
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),EE,FF,4,VAR1,VAR2,VAR3,VAR4

I would like to compare file 2 to file 1 and once you find a match in the file2 fields AA,BB,1 with similiar values in file 1 then retreive from file1 the xxxx and yyyy and store in file2. These values will then be used for further processing.

I would also like to check if during the compare process if the lookup table has more than 5 points per group, (such as in the last records in file 1)it would return some sort of alert that that is an irregular match.

Thank you for helping on my problem.
 
The following threads should get you pointing in the right direction:
thread271-170922
thread271-163206

HTH ;-) Dickie Bird

Honi soit qui mal y pense
 
if you post what your expected output would look like given representative data from
both files I might be able to help, but the
examples as is would make me work too hard.
;)
 
If record in data file 1 that is loaded into an array looks like:
XXXX YYYY AA BB 1 1

and record in data file 2 that is being compared to array data looks like:
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),AA,BB,1,1,VAR1,VAR2,VAR3,VAR4 ( the fields that contain "AA,BB,1,1" are in static fields $10,$11,$12,$13)

The comparsion from both data sets would be a match if
AA BB 1 AND 1 are complete match. It the match is identified, then the script would retreive the values XXXX AND YYYY in store in file 2 record.

End result would look like:
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),AA,BB,1,VAR1,VAR2,VAR3,VAR4,XXXX,YYYY


Thank You for checking this out.
 
something like that should get you started - not 'alerts' yet:

nawk -f a.awk file2

#--------- file1
XXXX YYYY AA BB 1 1
XXXX YYYY AA BB 1 2
XXXX YYYY AA BB 1 3
XXXX YYYY AA BB 1 4
XXXX YYYY AA BB 1 5
XXXX YYYY DD CC 5 1
XXXX YYYY DD CC 5 2
XXXX YYYY DD CC 5 3
XXXX YYYY DD CC 5 4
XXXX YYYY DD CC 5 5
XXXX YYYY EE FF 4 1
XXXX YYYY EE FF 4 2
XXXX YYYY EE FF 4 3
XXXX YYYY EE FF 4 4
XXXX YYYY EE FF 4 5
XXXX YYYY EE FF 4 6

#--------- file2
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),AA,BB,1,1,VAR1,VAR2,VAR3,VAR4
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),DD,CC,5,2,VAR1,VAR2,VAR3,VAR4
zzzzz,zzzzz,zzzzz,(NUMEROUS FIELDS),EE,FF,4,6,VAR1,VAR2,VAR3,VAR4

#---------- a.awk
BEGIN {
FS1=" "
FS2=","

file1="file1"

FS=FS1
while (getline< file1 > 0) {
idx=$3$4$5$6
file1ARR[idx]=$1 FS2 $2;
# printf(&quot;DEBUG: file1ARR[%s]->[%s]\n&quot;, idx, file1ARR[idx]);
}
FS=FS2

}

{
idx=$(NF-7) $(NF-6) $(NF-5) $(NF-4)
# printf(&quot; DEBUG: file1ARR[%s]->[%s]\n&quot;, idx, file1ARR[idx]);

if (idx in file1ARR)
print $0 FS2 file1ARR[idx];
}


vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
This is tentative, lets try to work through the bugs.

while ((getline < fname) > 0) {
array[x++] = $0
}
close(fname)
}
{
FS = &quot;,&quot;
for (i=1 ; i <= x ; i++) {
if (i == NR) {
mlist = $10&quot; &quot;$11&quot; &quot;$12&quot; &quot;$13
#print mlist
split(array,tmp,&quot; &quot;)
alist = tmp[3]&quot; &quot;tmp[4]&quot; &quot;tmp[5]&quot; &quot;tmp[6]
if (mlist == alist) {
$0 = $0 tmp[1]&quot; &quot;tmp[2]
}
}
}

A tentative test reveals that this approach works.
awk ' {
t = &quot;ORA123: error&quot;
for (i=1 ; i <= 3 ; i++) {
if (NR == i) {
mlist = $1&quot; &quot;$2
if (mlist == t) {
$0 = $0&quot;,&quot;t
}
}
} ; print $0
}' scrap.txt

Sat Sep 28
ORA123: error in xxjob,ORA123: error
ORA124: 12345 in xxjob
Sun Sep 29
ORA123: error in xxjob
ORA124: 12346 in xxjob

 
Thank you for all the assistance on this problem.
Unfortunately I can't get either of the above scripts to work. I am using pc based cygwin to run GAWK.

On Vlad's script I get parse errors.
I am executing it as gawk -f vlad.gawk file2

and on Marsd's script I also get parse errors.
Running it as gawk -f marsd.gawk file2

Sorry but I am not familiar with awk scripting to tell what the parse error is telling me.

I have updated the data files for better checking.
file1:
1111 2222 AA BB 1 1
2222 3333 AA BB 1 2
4444 5555 AA BB 1 3
6666 7777 AA BB 1 4
8888 9999 AA BB 1 5
1010 1111 DD CC 1 1
1212 1313 DD CC 5 2
1414 1515 DD CC 5 3
1616 1717 DD CC 5 4
1818 1919 DD CC 5 5
2020 2121 EE FF 4 1
2222 2323 EE FF 4 2
2424 2525 EE FF 4 3
2626 2727 EE FF 4 4
2828 2929 EE FF 4 5
3030 3131 EE FF 4 6

file2:
1,2,3,4,5,6,7,8,9,AA,BB,1,1
1,2,3,4,5,6,7,8,9,DD,CC,5,2
1,2,3,4,5,6,7,8,9,EE,FF,4,6

hopefull output would be:
1,2,3,4,5,6,7,8,9,AA,BB,1,1,1111,2222
1,2,3,4,5,6,7,8,9,DD,CC,5,2,1212,1313
1,2,3,4,5,6,7,8,9,EE,FF,4,6,3030,3131

thanks




 
I received no errors on my home machine when
running the program. It may be that I don't
understand the data format well enough.
If you could post the errors you receive that would be
a help.
 
Thanks, I thought maybe the error was because of no BEGIN { statement, but adding it didn't help either.

this is the script I'm running:


while ((getline < file1) > 0) {
array[x++] = $0
}
close(file1)
}
{
FS = &quot;,&quot;
for (i=1 ; i <= x ; i++) {
if (i == NR) {
mlist = $10&quot; &quot;$11&quot; &quot;$12&quot; &quot;$13
#print mlist
split(array,tmp,&quot; &quot;)
alist = tmp[3]&quot; &quot;tmp[4]&quot; &quot;tmp[5]&quot; &quot;tmp[6]
if (mlist == alist) {
$0 = $0 tmp[1]&quot; &quot;tmp[2]
}
}
}

gawk -f marsd.awk file2

Here's the error:
gawk: marsd.awk:1: while ((getline < file1) > o) {
gawk: marsd.awk:1: ^ parse error
gawk: marsd.awk:5: }
gawk: marsd.awk:5: ^ parse error
 
BEGIN {
while ((getline < file1) > 0) {
array[x++] = $0
}
close(file1)
}
}

{
FS = &quot;,&quot;
for (i=1 ; i <= x ; i++) {
if (i == NR) {
mlist = $10&quot; &quot;$11&quot; &quot;$12&quot; &quot;$13
#print mlist
split(array,tmp,&quot; &quot;)
alist = tmp[3]&quot; &quot;tmp[4]&quot; &quot;tmp[5]&quot; &quot;tmp[6]
if (mlist == alist) {
$0 = $0 tmp[1]&quot; &quot;tmp[2]
}
}
}
}

Try that, you may also have to use full
paths for file1. I am not familiar with cygwin but I have used gawk on a win98 machine for many similar things and had no trouble that I remember.
 
oops...
One too many brackets in the begin section.
 
hmmmmmmmm.........
I get NO errors running it under Cygwin's gawk.
What are you seeing and how are you running it?
vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
I use NAWK on the job and now I am getting the following error:
nawk -f marsd.awk file2
nawk: null file name in print or getline
source line number 2

I added the directory path for file name but it still doesn't run.

Here is the marsd.awk I am using:


BEGIN {
while ((getline < file1) > 0) {
array[x++] = $0
}
close(file1)
}
{
FS = &quot;,&quot;
for (i=1 ; i <= x ; i++) {
if (i == NR) {
mlist = $10&quot; &quot;$11&quot; &quot;$12&quot; &quot;$13
#print mlist
split(array,tmp,&quot; &quot;)
alist = tmp[3]&quot; &quot;tmp[4]&quot; &quot;tmp[5]&quot; &quot;tmp[6]
if (mlist == alist) {
$0 = $0 tmp[1]&quot; &quot;tmp[2]
}
}
}
}
 
Try it with vlads syntax I use gawk, so this is the second
time in two days I have run into a problem with nawk that doesn't exist with gawk, if this is syntax...

Use the full pathname:
while (getline < &quot;/pathto/file1&quot; > 0) {
Maybe that will work, otherwise try vlads, he's knows
nawk.
 
heh - don't forget the quotes OR use the '-v file1=pathname' on the command line ;)

while ((getline < &quot;file1&quot;) > 0) { vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
This is from your post way up top vlad..
I thought you knew something about nawk and
getline I didn't...
<snip>
BEGIN {
FS1=&quot; &quot;
FS2=&quot;,&quot;

file1=&quot;file1&quot;

FS=FS1
while (getline< file1 > 0) {
idx=$3$4$5$6
</snip>
;)
 
correct, but I had:

file1=&quot;file1&quot;
while (getline< file1 > 0) {
etc....

whereas your version didn't ;)
vlad
+---------------------------+
|#include<disclaimer.h> |
+---------------------------+
 
Ah, That's it..
Knew I missed something.

Have a nice weekend.
 
vlad,

Would you be so kind to repeat here on the forum the same script that was working for you on GAWK and the exact command line so I can check it against my GAWK.

Thanks
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top