Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

merge question

Status
Not open for further replies.

schocku

Programmer
Nov 20, 2001
23
US
I have a text file with contents as follows that will be loaded into a database table:

600193439^_600076830^_600193439^_2^^
600193430^_600076827^_600193430^_6^^
600192222^_600076830^_600191112^_2^^
600333333^_600076830^_600193111^_2^^
...
...

The table has columns 2 and 4 as the primary keys, so i want to merge the duplicate entries of column 2 and 4 in the text file to the last entry

in the above example the output file should be:
600333333^_600076830^_600193111^_2^^
600193430^_600076827^_600193430^_6^^

so that the database loading of the file does not break.

Also I want to write all duplicate entries into a seperate file to research later :

in the above example the researchfile should be:

600193439^_600076830^_600193439^_2^^
600192222^_600076830^_600191112^_2^^
600333333^_600076830^_600193111^_2^^

I sorted the file using the sort command but I am stuck after that ..
sort -t'^_' -k 2,4 $INPUTFILE

Any help is highly appreciated.
 
You can try something like this:
Code:
sort -t'^' -k 2,4 $INPUTFILE | awk -F'^' '
BEGIN{uniq="/path/to/uniq.txt";dups="/path/to/dups.txt"
{pk=$2""$4;last[pk]=$0;++nb[pk];all[pk","nb[pk]]=$0}
END{for(k in last)print last[k]>>uniq;close(uniq)
 for(k in nb)for(i=1;i<nb[k];++i)print all[k&quot;,&quot;i]>>dups
 close(dups)
}'

Hope This Help
PH.
 
PHV,

I tried your solution and ran into a few problems ...
* the uniq file has only the last row in the input file and * the dups has all the rows including duplicates.

Can you please explain your script??
Thanks for your help.
 
to get unique records:

sort -t '^' -k 2,2 -k 4,4 -u file.txt

to get the dups..... I have to think about it (one liner without awk)

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
here's how to get the dups, not a one-liner, but....

nawk -f schocku.awk file.txt

#--------------- schocku.awk
BEGIN {
FS=&quot;^&quot;
dupSUF=&quot;_dups&quot;
}
FNR==1 { dupsF= FILENAME dupSUF }
{ if (($2 SUBSEP $4) in dupsA) {
if (dupsA[$2,$4] == 1)
print first[$2,$4] >> dupsF
print $0 >> dupsF
}
else {
first[$2,$4]=$0
}
dupsA[$2,$4]++;
}


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Sorry for the typo on the BEGIN line.
Tested with your example:
Code:
sort -t'^' -k 2,4 $INPUTFILE | awk -F'^' '
BEGIN{uniq=&quot;/path/to/uniq.txt&quot;;dups=&quot;/path/to/dups.txt&quot;}
{pk=$2&quot;&quot;$4;last[pk]=$0;++nb[pk];all[pk&quot;,&quot;nb[pk]]=$0}
END{for(k in last)print last[k]>uniq
 for(k in nb)if(nb[k]>1)
  for(i=1;i<=nb[k];++i)print all[k&quot;,&quot;i]>dups
}'
This awk program stores (for each primary key pk) in last the last entry, in nb the number of occurences, and in all all the sorted entries.
Try to copy and paste the code to avoid typo.


Hope This Help
PH.
 
Sorry for the late reply. I will try it and let you know. Thanks for your responses.
 
Try this:

awk -F'^' '{
keys=$2&quot;,&quot;$4
if (keys in keyarr) print >> &quot;dups&quot;
else {
keyarr[keys]
print >> &quot;uniq&quot;
}
}' ttfile
 
Ok, I missed the last part. To get the last record with each key try this:

awk -F'^' '{
keys=$2&quot;,&quot;$4
if (keys in keyarr)
print keyarr[keys] > &quot;dups&quot;
keyarr[keys] = $0
}
END {
for (i in keyarr)
print keyarr > &quot;uniq&quot;
}'
 
This posting process insists on editing the last two lines,
they should read

print keyarr>&quot;uniq&quot;

with i enclosed in square brackets after keyarr
 
Imdougal,

pls report withOUT 'Process TGML' being checked.

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Imdougal, to avoid your problem, you can post your code like this:
Code:
Your code here
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top