Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How do I remove duplicate records?

Status
Not open for further replies.

tpbjr

MIS
Oct 8, 2004
120
US
I have situation where I need to remove voucher records if a voucher is created then voided with in the same file.
I will need both of them removed because they cancel each other out with in the same file.

Data example:
001PVB00011/04/2004 RECP100Item1 +000000010.0000Eac+00003000.00INV300098765 Z100038 SPCA11/04/2004
001PVB00011/04/2004 RECP900paper2 +000000010.0000Rol+00001500.00INV3000789 Z100054 SPCA11/04/2004
001PVB00011/04/2004 RECP900paper2 +000000010.0000Rol+00001000.00INV3000789 Z100054cSPCA11/04/2004
001PVB00011/04/2004 RECP987timber +000000100.0000Eac+00001200.00INV0054 Z100054 SPCA11/04/2004
001PVB00011/05/2004 RECP120TEST +000000010.0000Eac+00000300.00INV345678 Z100038 SPCA11/05/2004
001PVB00011/05/2004 RECP983test2 +000000010.0000Eac+00000300.00INV345678 Z100038 SPCA11/05/2004


The key is from position 1 to 51
The cancel voucher flag is at position 108

Notice the record with the C in position 108 and the record before it will need to be removed.
This is just one example of the key being duplicated, there could be more then two records duplicated with in a given file.

Is there a way to do this in unix?

Unfortunately the columns in the data are not lining up in on the web, but hopefully you understand what I am after.

Thank you for all your help

Tom
 
there's no 'C' in position 108 in the above sample.
The following characters appear in position 108 for the sample lines:
8
4
4
4
8
8

Pls clarify.

Also wouldn't it be easier to deal with 'fields' then to deal with 'character positions'?


vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
I thought the C which is right before the SPCA was in pos 108. How do I deal with fields and not positions if there is no delimiter? It is positional.

Thanks

Thank you for all your help

Tom
 
If you put the cursor to the left side of the first character (left side of record) and count up to 108 it puts the cursor at position 108, followed by a c. This is referring to the second record.

Thanks
Tom

Thank you for all your help

Tom
 
1. this is not 'C' - it's 'c'
2. it is not in position 108, but position 109
3. assuming your 'cancel' comes AFTER the 'good' one, you can try the following:

nawk -f tp.awk myFile.txt

Code:
BEGIN {
  FLD_cancel="109"
  FLD_cancelVal="c"
}

{
   idx= $1 SUBSEP $2;
   c=tolower(substr($0, FLD_cancel,1));
   if ( (idx in arr) && (c == FLD_cancelVal) ) {
      delete arr[idx];
      next;
   }
   arr[idx]=$0
}

END {
  for( i in arr)
    print arr[i]
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+
 
Assuming the following:
The key is from position 1 to 51
The cancel voucher flag is at position 109
You may try something like this:
awk '
{k=substr($0,1,51);t[NR]=$0;if(substr($0,109,1)=="c")++c[k]}
END{for(i=1;i<=NR;++i)if(0+c[substr(t,1,51)]==0)print t}
' /path/to/input

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top