Join two files (sort of) 2

mwesticle · Aug 20, 2004

I have an easy to semi-difficult problem here, and I was wondering if any of the good people at Tek-Tips could help me out... Here goes:

I have two files (File A and File B), each have records that are 100 bytes in length. There are 12-digit "keys" on each file, in positions 1-12. File A is sorted on this 12-digit "key". File A looks like this:

100000000000AAAAAAAAAAAAAAAAA....and so on to position 100
200000000000AAAAAAAAAAAAAAAAA....and so on to position 100
300000000000AAAAAAAAAAAAAAAAA....and so on to position 100

File B is NOT sorted on this key. File B looks like this:

200000000000BBBBBBBBBBBBBBBBB....and so on to position 100
100000000000BBBBBBBBBBBBBBBBB....and so on to position 100
111111111111BBBBBBBBBBBBBBBBB....and so on to position 100

What I want to do is join the two files based on this 12-digit key, and group by this 12-digit key, throwing out all records from file B that don't have a match in file A.

So, the resulting output file should look like this:

100000000000ABBSNDNNDJFJFFJFJ....and so on to position 100
200000000000lhjbdfllfblhdflhf....and so on to position 100
200000000000KJFSKJSFkfjsakjbn....and so on to position 100
200000000000KJFSKJSFkfjsakjbn....and so on to position 100
300000000000mfd3434MFMFN2323n....and so on to position 100

So, I want to keep the original order from File A, and group each record from File A together with each key match record on File B. I want to keep all records from File A even if it has no match in file B. But I want to throw out all records in File B that don't have a match in File A.

Anyone out there know how I can achieve this? Any help would be greatly appreciated! Thanks!

PHV · Aug 20, 2004

Brute force method:
awk '{print;system("grep \"^"substr($0,1,12)"\" fileB")}' fileA

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

mwesticle · Aug 20, 2004

Thanks for the suggestion, but this just seems to spit out File A. What I mean is, the resulting file (File C) is EXACTLY the same (byte-for-byte) as File A. I'm using this command:

awk '{print;system("grep \"^"substr($0,1,12)"\" fileB")}' fileA > fileC

What I'm looking for here is an output file that contains allrecords from File A, and also contains all of the matches on File B, grouped by the orignal order of File A. Does that make sense? I'm not sure, maybe I'm doing something wrong. Any other ideas?

KenCunningham · Aug 20, 2004

Have you tried comm?

PHV · Aug 22, 2004

mwesticle, works for me:
> cat fileA
100000000000AAAAAAAAAAAAAAAAA....and so on to position 100
200000000000AAAAAAAAAAAAAAAAA....and so on to position 100
300000000000AAAAAAAAAAAAAAAAA....and so on to position 100
> cat fileB
200000000000BBBBBBBBBBBBBBBBB....and so on to position 100
100000000000BBBBBBBBBBBBBBBBB....and so on to position 100
111111111111BBBBBBBBBBBBBBBBB....and so on to position 100
> awk '{print;system("grep \"^"substr($0,1,12)"\" fileB")}' fileA > fileC
> cat fileC
100000000000AAAAAAAAAAAAAAAAA....and so on to position 100
100000000000BBBBBBBBBBBBBBBBB....and so on to position 100
200000000000AAAAAAAAAAAAAAAAA....and so on to position 100
200000000000BBBBBBBBBBBBBBBBB....and so on to position 100
300000000000AAAAAAAAAAAAAAAAA....and so on to position 100
>

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Join two files (sort of) 2

mwesticle

Programmer

PHV

MIS

mwesticle

Programmer

KenCunningham

Technical User

PHV

MIS

Similar threads

Part and Inventory Search

Sponsor