Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations derfloh on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Dedupe a file... 2

Status
Not open for further replies.

mwesticle

Programmer
Nov 19, 2003
51
US
I have a small issue. Within a Korn shell script, I want to dedupe a file. I want two outputs: the deduped results (output to a flat file), and all the duplicate records that got thrown out during the deduplication process (outputted to a second flat file). How would I go about this. I mean, I know deduping a file is very easy, I just don't know how to capture the "discarded" records into a file... Anyone??
 
One thing I should mention is that the records that need to be deduped are not identical throughout the whole record. The records need to be deduped based on a key, which is in positions 1-11 (it is a fixed file). Sorry, I should have mentioned that before...
 
You can do something like this :
[tt]
awk '
{
key = substr($0,1,11);
if (key == prv)
print $0 >> "duplicates.dat";
else {
print $0
prv = key
}
}' input_file > uniq.dat
[/tt]
duplicate records are stored in the file : duplicates.dat



Jean Pierre.
 
In case your file is not already sorted, do this:
Code:
sort input_file | awk '
{
awk script from Jean-Pierre
Code:
}' >uniq.dat

Hope This Help
PH.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top