Dedupe a file... 2

mwesticle · Feb 6, 2004

I have a small issue. Within a Korn shell script, I want to dedupe a file. I want two outputs: the deduped results (output to a flat file), and all the duplicate records that got thrown out during the deduplication process (outputted to a second flat file). How would I go about this. I mean, I know deduping a file is very easy, I just don't know how to capture the "discarded" records into a file... Anyone??

mwesticle · Feb 6, 2004

One thing I should mention is that the records that need to be deduped are not identical throughout the whole record. The records need to be deduped based on a key, which is in positions 1-11 (it is a fixed file). Sorry, I should have mentioned that before...

aigles · Feb 6, 2004

You can do something like this :
[tt]
awk '
{
key = substr($0,1,11);
if (key == prv)
print $0 >> "duplicates.dat";
else {
print $0
prv = key
}
}' input_file > uniq.dat
[/tt]
duplicate records are stored in the file : duplicates.dat

Jean Pierre.

PHV · Feb 7, 2004

In case your file is not already sorted, do this:

Code:

sort input_file | awk '
{

awk script from Jean-Pierre

Code:

}' >uniq.dat

Hope This Help
PH.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Dedupe a file... 2

mwesticle

Programmer

mwesticle

Programmer

aigles

Technical User

PHV

MIS

Similar threads

Part and Inventory Search

Sponsor