Compare two CSVs based on unique ID

stu78 · Jan 25, 2010

Hi,

I've written the following code to compare 2 csvs,

awk 'NR==FNR{_[NR]=$0}
NR!=FNR{
if($0 != _[FNR])
{
print $0
}
}' export1.csv export2.csv

But I need to extend it (as it really only offers diff capabilities) to compare based on some unique ID e.g. userid.

File 1
userid, fname, sname, dob, license number

1, diar, cart, 12/12/12, 13324

17, mickey, mouse, 2/2/22, 22222
14, donald1, duck, 1/1/11, 11111

File 2

userid, fname, sname, dob, license number

1, diar, cart, 12/12/12, 1332x

22, tom, thumb, 4/4/44, 44444
14, donald1, duck, 1/1/11, 11111
17, mickey, mouse, 2/2/22, 22222

feherke · Jan 25, 2010

Hi

Like this ?

Code:

awk 'NR==FNR{_[[highlight]$1[/highlight]]=$0}
NR!=FNR{
        if($0 != _[[highlight]$1[/highlight]])
        {
                print $0
        }
}' export1.csv export2.csv

Feherke.

http://free.rootshell.be/~feherke/

stu78 · Jan 25, 2010

Thanks feherke,

However, in File 2 if I move e.g.
1, diar, cart, 12/12/12, 13324

to the bottom of the file; this will still flag as a difference (even though it is an identical entry - but in a different place in the file).

PHV · Jan 25, 2010

Did you use the -F, option ?

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

stu78 · Jan 25, 2010

-F as in the file seperator? How would this help?

feherke · Jan 25, 2010

Hi

stu78 said:
even though it is an identical entry

Well, in the sample data you posted, the lines with userid 1 are not identical. The modification I suggested works for your sample data, wherever the line with userid 1 is.

Feherke.

http://free.rootshell.be/~feherke/

PHV · Jan 26, 2010

Imeant this:
awk [!]-F,[/!] 'NR==FNR ...

Hope This Helps, PH.
FAQ219-2884
FAQ181-2886

stu78 · Jan 27, 2010

Hi both,

Thanks so much for your responses... it's mostly working; however; I've done some tests around creating deltas between files as follows;

File 1 ->
userid, fname, sname, dob, license number

1, diar, cart, 12/12/12, 13324

17, mickey, mouse, 2/2/22, 22222
14, donald1, duck, 1/1/11, 11111

1. Copy file 1 to File 2
2. Add a line to file 2 (the awk you've given me finds the add)
3. Modify a line in file 2 (the awk you've given me finds the mod)
4. Delete a line in file 2 (the awk you've given me does not find the delete)

How can I correct issue 4?

Annihilannic · Jan 31, 2010

A lazy way would be to run it twice, once with the parameters file1 file2 and then with file2 file1.

I don't fully understand your requirements though... do you only want to check that the list of unique UIDs is present in both files, ignoring all other fields?

Annihilannic.

feherke · Jan 31, 2010

Hi

No idea what happened, but I am quite sure that I replied to this.

If you are sure that userid is unique in both files, this is enough, otherwise an additional array has to be used :

Code:

awk 'NR==FNR{_[$1]=$0;next}$0!=_[$1];{delete _[$1]}END{for(i in _)print _[i]}' export1.csv export2.csv

Tested with [tt]gawk[/tt] and [tt]mawk[/tt].

Feherke.

http://free.rootshell.be/~feherke/

stu78 · Feb 7, 2010

Thank you feherke!

stu78 · Feb 12, 2010

This issue with result posted is that it is giving me unstructured output. i.e. a mixture of the old and new data. I only want the adds, the modifys, and the deletes - not the old data in the output... Is this possible?

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Compare two CSVs based on unique ID

stu78

Programmer

feherke

Programmer

stu78

Programmer

PHV

MIS

stu78

Programmer

feherke

Programmer

PHV

MIS

stu78

Programmer

Annihilannic

MIS

feherke

Programmer

stu78

Programmer

stu78

Programmer

Similar threads

Part and Inventory Search

Sponsor