Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

AWK array question

Status
Not open for further replies.

kobewins

Programmer
Dec 1, 2005
57
0
0
US
I have the following two files, HAVING the exact format, WITH pipe "|" AS the field delimitor.

Please help me out, what's wrong with my script?
Thanks.

WHAT I want IS:

IF ( ($1=="F" for both files) && ($2 OF File_1.dat equals $2 OF File_2.dat) )
THEN
print $5 in File_1.dat & File_2.dat
FI

IN my example, I need the output AS:

FORM 00093545798 00093545799
FORM 00093545798 00093545799
FORM 00093545800 00093545700

File_1.dat
F|FORM|LABEL 1|000710530|00093545700|
F|FORM|LABEL 1|000710530|00093545798|
F|FORM|LABEL 1|122800059|00093545798|
F|FORMWHI|LABEL 2|545693984|00093545798|
F|FORMWHI|LABEL 2|548682665|00093545798|
F|FORMWHI|LABEL 2|552890553|00093545798|
F|FORMWHI|LABEL 3|578664420|00093545798|
F|FORMCVS|LABEL 3|588640859|00093545798|
F|FORMCVS|LABEL 3|680304420|00093545798|
F|FORM8000|LABEL 4|591480012|50458030503|
F|FORM8000|LABEL 4|591480013|50458030503|

File_2.dat
F|FORM|LABEL 1|000710530|00093545799|
F|FORM|LABEL 1|122800059|00093545799|
F|FORMOLDWHI|LABEL 5|545693984|00093545799|
F|FORMOLDWHI|LABEL 5|548682665|00093545799|
F|FORMOLDWHI|LABEL 5|552890553|00093545799|
F|FORMOLDWHI|LABEL 5|578664420|00093545799|
F|FORMOLDWHI|LABEL 5|588640859|00093545799|
F|FORMOLDWHI|LABEL 5|680304420|00093545799|
P|FORM8000|LABEL 6|591480010|50458030512|
P|FORM8000|LABEL 6|591480010|50458030512|

But I got the following output AFTER USING the script following.

FORM 00093545798 00093545799
FORM 00093545798 00093545799
FORM 00093545798 00093545700

Code:
#!/bin/awk -f

### Script Name: form.awk
### Usage:	form.awk File_2.dat

BEGIN {
	FS="|"
	while (getline < "File_1.dat") { 
   		FORMS[1]=$2  #[COLOR=red]I didn't figure out how. [/color] 
   		NDC11[$2]=$5   ## Store Field 5 in ARRAY NDC11 using 2nd field as the subscript
   	} 
}

#[COLOR=red]I don't know how to do it, so just hard-coded  $1 == "F" here. [/color]      
$1 == "F" && length(NDC11[$2])>0 {
	print $2 , NDC11[$2], $5
}
 
Hi All,

Can anyone help on it? Do I need to provide any more information, or to clarify something?

Thanks
David
 
or to clarify something
where is 00093545800 coming from.

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ181-2886
 
Thanks, PHV, for your response. Sorry, I modified the test data before my post.

Here are the files:

File_1.dat

F|FORM|LABEL 1|000710530|00093545800|
F|FORM|LABEL 1|000710530|00093545798|
F|FORM|LABEL 1|122800059|00093545798|
F|FORMWHI|LABEL 2|545693984|00093545798|
F|FORMWHI|LABEL 2|548682665|00093545798|
F|FORMWHI|LABEL 2|552890553|00093545798|
F|FORMWHI|LABEL 3|578664420|00093545798|
F|FORMCVS|LABEL 3|588640859|00093545798|
F|FORMCVS|LABEL 3|680304420|00093545798|
F|FORM8000|LABEL 4|591480012|50458030503|
F|FORM8000|LABEL 4|591480013|50458030503|

File_2.dat

F|FORM|LABEL 1|000710530|00093545799|
F|FORM|LABEL 1|122800059|00093545799|
F|FORM|LABEL 1|000710530|00093545700|
F|FORMOLDWHI|LABEL 5|545693984|00093545799|
F|FORMOLDWHI|LABEL 5|548682665|00093545799|
F|FORMOLDWHI|LABEL 5|552890553|00093545799|
F|FORMOLDWHI|LABEL 5|578664420|00093545799|
F|FORMOLDWHI|LABEL 5|588640859|00093545799|
F|FORMOLDWHI|LABEL 5|680304420|00093545799|
P|FORM8000|LABEL 6|591480010|50458030512|
P|FORM8000|LABEL 6|591480010|50458030512|
 
given these 2 records from file_1.dat:
Code:
F|FORM|LABEL 1|000710530|00093545800|
F|FORM|LABEL 1|000710530|00093545798|

which field 5 do you need?

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Vlad,

Idealy, I'd like to get all Field 5 from File_1.dat and File_2.dat. I noticed that something worng with the following code since it always gives me the same NDC11[$2] -- the Field 5 of File_1.dat.
Code:
$1 == "F" && length(NDC11[$2])>0 {
    print $2 , NDC11[$2], $5
}
To answer your question, I need both.
Code:
     File_1.dat  File_2.dat
     ----------  ----------
FORM 00093545798 00093545799
FORM 00093545798 00093545799
FORM 00093545800 00093545700
Since I use File_2.dat as the loop, there might be chance that there are different number of rows matched in File_1.dat and File_2.dat. For example, if I delete the row:
Code:
F|FORM|LABEL 1|000710530|00093545800|
from File_1.dat,
I'd see the result as:
Code:
FORM 00093545798 00093545799
FORM 00093545798 00093545799
FORM             00093545700

or

FORM             00093545799
FORM 00093545798 00093545799
FORM 00093545798 00093545700

I don't know how to handle the case when File_1.dat has more matched rows than File_2.dat. I think I'll loss some rows from File_1.dat.

Thanks
David
 
It's a complicated requirement... I think it needs better definition. At first I thought you wanted to just do a join on field 2, 3 and 4, but including the ones that don't have a match. But then that isn't the case either, because you have multiple lines in the same file with the same key, e.g.

[tt]F|FORM|LABEL 1|000710530|00093545800|
F|FORM|LABEL 1|000710530|00093545798|[/tt]

So... if you wanted to list every combination, it would be awkward. In your example:

Code:
     File_1.dat  File_2.dat
     ----------  ----------
FORM 00093545798 00093545799
FORM 00093545798 00093545799
FORM 00093545800 00093545700

...what can you use to decide that ...800 is associated with ...700, but not with ...798 or ...799? Or would you want to list all four combinations?

Annihilannic.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top