Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Comparing and extracting from 2 files...

Status
Not open for further replies.

bondtrails

Technical User
Nov 19, 2002
31
US
Hi everyone, I need some help! (again):
I want to compare a set of fields on 2 files and create a third file where they differ. I'm not sure how to do it and I need some help!!

More info:
I wish to compare column locations 32-49 of files A and B. For those records in B that don't exist in A, I wish to output the entire record of B to some file named C. I can do this easily in SQL, but I am trying to stay within the bash unix environment!!

Thank you for any valuable assistance you can provide.

--Bondster!!
 
Try this awk program

awk -v fna="A" -f dif.awk B > C

# ------ dif.awk ------
BEGIN {
if (fna=="") fna = "file1"
while ((getline < fna) > 0) a[substr($0,32,18)] = 1
}
{
if (!a[substr($0,32,18)]) print
} CaKiwi
 
Hey thanks CaKiwi,
I gave it a shot and didn't have much luck--actually you lost me with your awk code (i guess i am more of a newbie than I thought)

Here's the error I get back:
awk: syntax error at source line 2
context is
>>> if <<< (fna==&quot;&quot;) fna = &quot;file1&quot;
extra }
awk: bailing out at source line 7

What could be going wrong??

--Bondster!!
 
What system are you on? If you are on solaris, use nawk instead of awk. Otherwise, post back the code you are using to see if we have a cut and paste problem. CaKiwi
 
Hey CaWiki,
I am using Mac OS X (the underlying *nix is Darwin--derivative of BSD 4.4--and it works fine for other awk programs).

Here is the command I am using:
awk -v fna=&quot;A.txt&quot; -f dif.awk B.txt > C.txt

Here is my code (from cut and paste):

# ------ dif.awk ------
BEGIN {
if (fna==&quot;&quot;) fna=&quot;file1&quot;
while ((getline < fna) > 0) a[substr($0,32,18)]=1
}
{
if (!a[substr($0,32,18)]) print
}

# ------ end ----------
What do you think?
 
Hmmm. Try deleting the line

if (fna==&quot;&quot;) fna=&quot;file1&quot;

since the error mesage is pointing to it.
It is only to set the fileame of the first file to a dafault value if you have not used the -v switch to set it.
CaKiwi
 
Ugh!!! Silly Silly foolish me!!!

CaWiki, thanks for all your good help--I found out what was wrong. Your code was impeccable--I was at fault.

Here's what I did wrong: I wrote the dif.awk program using Apple's TextEdit program (which of course, adds RTF control characters). When I examined the file under pico or vi, I noticed some odd control (formatting) characters--I should have known (DUH!)

Thanks again good buddy!! You're a genius!

--Bondster!!
 
By the way, can you walk me through your code? I get the jist of the whole thing, but not at a line-by-line level. In your dif.awk script, I see that you loop through the entire fna file (in my case, my A.txt file) and limit your comparison in the desired column locations (via substr($0,32,18)). But what exactly does the a[] mean? and how (and when) is the comparison to my B.txt file being done?

Thanks for helping out a newbie!!

--Bondster!!
 

Awk has associative arrays, i.e. they are indexed by strings rather than integers. So

a[substr($0,32,18)]=1

sets to 1 the element of array a indexed by the string from A.txt we are interested in. Then when we read file B.txt, any element of array a which has not been set will be null and the test

if (!a[substr($0,32,18)])

will return true and cause the line to be printed.

I hope this is clear. Feel free to ask more questions if it is not. CaKiwi
 
Hey CaWiki,
thanks for the explanation. So does this mean that the expression in a[] have to evaluate to consecutive characters in the record?

Suppose I wanted to do my file comparison based on not just column locations 32-49, but instead, locations 32-38, 41, 47, 49.

How to do??

--Bondster!!
P.S. Happy Thanksgiving!
 
You could just concatenate the substrings together to make the index

a[substr($0,32,7) substr($0,41,1) substr($0,47,1) substr($0,49,1)]

P.S. Thanks, same to you.

P.P.S. CaWiki? CaKiwi
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top