How do I remove duplicate records?

tpbjr · Nov 10, 2004

I have situation where I need to remove voucher records if a voucher is created then voided with in the same file.
I will need both of them removed because they cancel each other out with in the same file.

Data example:
001PVB00011/04/2004 RECP100Item1 +000000010.0000Eac+00003000.00INV300098765 Z100038 SPCA11/04/2004
001PVB00011/04/2004 RECP900paper2 +000000010.0000Rol+00001500.00INV3000789 Z100054 SPCA11/04/2004
001PVB00011/04/2004 RECP900paper2 +000000010.0000Rol+00001000.00INV3000789 Z100054cSPCA11/04/2004
001PVB00011/04/2004 RECP987timber +000000100.0000Eac+00001200.00INV0054 Z100054 SPCA11/04/2004
001PVB00011/05/2004 RECP120TEST +000000010.0000Eac+00000300.00INV345678 Z100038 SPCA11/05/2004
001PVB00011/05/2004 RECP983test2 +000000010.0000Eac+00000300.00INV345678 Z100038 SPCA11/05/2004

The key is from position 1 to 51
The cancel voucher flag is at position 108

Notice the record with the C in position 108 and the record before it will need to be removed.
This is just one example of the key being duplicated, there could be more then two records duplicated with in a given file.

Is there a way to do this in unix?

Unfortunately the columns in the data are not lining up in on the web, but hopefully you understand what I am after.

Thank you for all your help

http://www.besware.com

Tom

vgersh99 · Nov 10, 2004

there's no 'C' in position 108 in the above sample.
The following characters appear in position 108 for the sample lines:
8
4
4
4
8
8

Pls clarify.

Also wouldn't it be easier to deal with 'fields' then to deal with 'character positions'?

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

tpbjr · Nov 10, 2004

I thought the C which is right before the SPCA was in pos 108. How do I deal with fields and not positions if there is no delimiter? It is positional.

Thanks

Thank you for all your help

http://www.besware.com

Tom

tpbjr · Nov 10, 2004

If you put the cursor to the left side of the first character (left side of record) and count up to 108 it puts the cursor at position 108, followed by a c. This is referring to the second record.

Thanks
Tom

Thank you for all your help

http://www.besware.com

Tom

vgersh99 · Nov 10, 2004

1. this is not 'C' - it's 'c'
2. it is not in position 108, but position 109
3. assuming your 'cancel' comes AFTER the 'good' one, you can try the following:

nawk -f tp.awk myFile.txt

Code:

BEGIN {
  FLD_cancel="109"
  FLD_cancelVal="c"
}

{
   idx= $1 SUBSEP $2;
   c=tolower(substr($0, FLD_cancel,1));
   if ( (idx in arr) && (c == FLD_cancelVal) ) {
      delete arr[idx];
      next;
   }
   arr[idx]=$0
}

END {
  for( i in arr)
    print arr[i]
}

vlad
+----------------------------+
| #include<disclaimer.h> |
+----------------------------+

PHV · Nov 11, 2004

Assuming the following:
The key is from position 1 to 51
The cancel voucher flag is at position 109
You may try something like this:
awk '
{k=substr($0,1,51);t[NR]=$0;if(substr($0,109,1)=="c")++c[k]}
END{for(i=1;i<=NR;++i)if(0+c[substr(t,1,51)]==0)print t}
' /path/to/input

Hope This Helps, PH.
Want to get great answers to your Tek-Tips questions? Have a look at FAQ219-2884 or FAQ222-2244

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

How do I remove duplicate records?

tpbjr

MIS

vgersh99

Programmer

tpbjr

MIS

tpbjr

MIS

vgersh99

Programmer

PHV

MIS

Similar threads

Part and Inventory Search

Sponsor