Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Compare two CSVs based on unique ID

Status
Not open for further replies.

stu78

Programmer
May 29, 2002
121
0
0
GB
Hi,

I've written the following code to compare 2 csvs,

awk 'NR==FNR{_[NR]=$0}
NR!=FNR{
if($0 != _[FNR])
{
print $0
}
}' export1.csv export2.csv

But I need to extend it (as it really only offers diff capabilities) to compare based on some unique ID e.g. userid.

File 1
userid, fname, sname, dob, license number

1, diar, cart, 12/12/12, 13324

17, mickey, mouse, 2/2/22, 22222
14, donald1, duck, 1/1/11, 11111

File 2

userid, fname, sname, dob, license number

1, diar, cart, 12/12/12, 1332x

22, tom, thumb, 4/4/44, 44444
14, donald1, duck, 1/1/11, 11111
17, mickey, mouse, 2/2/22, 22222


 
Thanks feherke,

However, in File 2 if I move e.g.
1, diar, cart, 12/12/12, 13324

to the bottom of the file; this will still flag as a difference (even though it is an identical entry - but in a different place in the file).
 
Did you use the -F, option ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
-F as in the file seperator? How would this help?
 
Imeant this:
awk [!]-F,[/!] 'NR==FNR ...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886
 
Hi both,

Thanks so much for your responses... it's mostly working; however; I've done some tests around creating deltas between files as follows;

File 1 ->
userid, fname, sname, dob, license number

1, diar, cart, 12/12/12, 13324

17, mickey, mouse, 2/2/22, 22222
14, donald1, duck, 1/1/11, 11111

1. Copy file 1 to File 2
2. Add a line to file 2 (the awk you've given me finds the add)
3. Modify a line in file 2 (the awk you've given me finds the mod)
4. Delete a line in file 2 (the awk you've given me does not find the delete)

How can I correct issue 4?
 
A lazy way would be to run it twice, once with the parameters file1 file2 and then with file2 file1.

I don't fully understand your requirements though... do you only want to check that the list of unique UIDs is present in both files, ignoring all other fields?

Annihilannic.
 
Hi

No idea what happened, but I am quite sure that I replied to this.

If you are sure that userid is unique in both files, this is enough, otherwise an additional array has to be used :
Code:
awk 'NR==FNR{_[$1]=$0;next}$0!=_[$1];{delete _[$1]}END{for(i in _)print _[i]}' export1.csv export2.csv
Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.
 
This issue with result posted is that it is giving me unstructured output. i.e. a mixture of the old and new data. I only want the adds, the modifys, and the deletes - not the old data in the output... Is this possible?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top