Comparing and extracting from 2 files...

bondtrails · Nov 25, 2002

Hi everyone, I need some help! (again):
I want to compare a set of fields on 2 files and create a third file where they differ. I'm not sure how to do it and I need some help!!

More info:
I wish to compare column locations 32-49 of files A and B. For those records in B that don't exist in A, I wish to output the entire record of B to some file named C. I can do this easily in SQL, but I am trying to stay within the bash unix environment!!

Thank you for any valuable assistance you can provide.

--Bondster!!

CaKiwi · Nov 25, 2002

Try this awk program

awk -v fna="A" -f dif.awk B > C

# ------ dif.awk ------
BEGIN {
if (fna=="&quot

fna = "file1"
while ((getline < fna) > 0) a[substr($0,32,18)] = 1
}
{
if (!a[substr($0,32,18)]) print
} CaKiwi

bondtrails · Nov 25, 2002

Hey thanks CaKiwi,
I gave it a shot and didn't have much luck--actually you lost me with your awk code (i guess i am more of a newbie than I thought)

Here's the error I get back:
awk: syntax error at source line 2
context is
>>> if <<< (fna=="&quot

fna = "file1"
extra }
awk: bailing out at source line 7

What could be going wrong??

--Bondster!!

CaKiwi · Nov 26, 2002

What system are you on? If you are on solaris, use nawk instead of awk. Otherwise, post back the code you are using to see if we have a cut and paste problem. CaKiwi

bondtrails · Nov 26, 2002

Hey CaWiki,
I am using Mac OS X (the underlying *nix is Darwin--derivative of BSD 4.4--and it works fine for other awk programs).

Here is the command I am using:
awk -v fna="A.txt" -f dif.awk B.txt > C.txt

Here is my code (from cut and paste):

# ------ dif.awk ------
BEGIN {
if (fna=="&quot

fna="file1"
while ((getline < fna) > 0) a[substr($0,32,18)]=1
}
{
if (!a[substr($0,32,18)]) print
}

# ------ end ----------
What do you think?

CaKiwi · Nov 26, 2002

Hmmm. Try deleting the line

if (fna=="&quot

fna="file1"

since the error mesage is pointing to it.
It is only to set the fileame of the first file to a dafault value if you have not used the -v switch to set it.
CaKiwi

bondtrails · Nov 26, 2002

Ugh!!! Silly Silly foolish me!!!

CaWiki, thanks for all your good help--I found out what was wrong. Your code was impeccable--I was at fault.

Here's what I did wrong: I wrote the dif.awk program using Apple's TextEdit program (which of course, adds RTF control characters). When I examined the file under pico or vi, I noticed some odd control (formatting) characters--I should have known (DUH!)

Thanks again good buddy!! You're a genius!

--Bondster!!

bondtrails · Nov 26, 2002

By the way, can you walk me through your code? I get the jist of the whole thing, but not at a line-by-line level. In your dif.awk script, I see that you loop through the entire fna file (in my case, my A.txt file) and limit your comparison in the desired column locations (via substr($0,32,18)). But what exactly does the a[] mean? and how (and when) is the comparison to my B.txt file being done?

Thanks for helping out a newbie!!

--Bondster!!

CaKiwi · Nov 27, 2002

Awk has associative arrays, i.e. they are indexed by strings rather than integers. So

a[substr($0,32,18)]=1

sets to 1 the element of array a indexed by the string from A.txt we are interested in. Then when we read file B.txt, any element of array a which has not been set will be null and the test

if (!a[substr($0,32,18)])

will return true and cause the line to be printed.

I hope this is clear. Feel free to ask more questions if it is not. CaKiwi

bondtrails · Nov 27, 2002

Hey CaWiki,
thanks for the explanation. So does this mean that the expression in a[] have to evaluate to consecutive characters in the record?

Suppose I wanted to do my file comparison based on not just column locations 32-49, but instead, locations 32-38, 41, 47, 49.

How to do??

--Bondster!!
P.S. Happy Thanksgiving!

CaKiwi · Nov 27, 2002

You could just concatenate the substrings together to make the index

a[substr($0,32,7) substr($0,41,1) substr($0,47,1) substr($0,49,1)]

P.S. Thanks, same to you.

P.P.S. CaWiki? CaKiwi

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Comparing and extracting from 2 files...

bondtrails

Technical User

CaKiwi

Programmer

bondtrails

Technical User

CaKiwi

Programmer

bondtrails

Technical User

CaKiwi

Programmer

bondtrails

Technical User

bondtrails

Technical User

CaKiwi

Programmer

bondtrails

Technical User

CaKiwi

Programmer

Similar threads

Part and Inventory Search

Sponsor