Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Identifying Duplicates 1

Status
Not open for further replies.

IanWaterman

Programmer
Jun 26, 2002
3,511
GB
I am trying to reconcile data going into a Datawarehouse from two different sources

Premium is posted and charged to Credit card and policy receipted with correct amount.

Data is transmitted to Credit Card provider which hold policy number and amount plus other info. So the only link between policy data and Credit card data is policy no, amount and date. This is all fine and I can relate the two data sets on those fields.

Problem arises when Credit card payment generator has a senior moment and posts reciept multiple times. So in my Policy data set I have one record and in credit card set say 3 records.

Is there a quick way I can identify the fact that Policy data is being duplicated when it matches with CC data, so that I can add a field which shows Duplicated Y or N. This is all going into a table for reconciliation. If I do not identify the dupes table will reconcile as Policy data will be duplicated to match!

Thank you

Ian
 
Trust this is sufficient, not transaction numbers in the duplicated data are different, all else is the same except posting times. To link I convert to dates.

Code:
CREATE TABLE #TMP_INSIGHT_R 
    (
		  TrxNo INT
		, MAIN_INSIGHT_POLICYNO NVARCHAR(20)
		, PAT_CARD_NO NVARCHAR(20)
		, RecAmt Decimal(9,2)
		, PostingDate DATE
		 )
    
    CREATE TABLE #TMP_CI_DETAILS 
    (
		 PolicyCommIdea_PK Numeric(19,0)
		, PAN NVARCHAR(20)
		, POLICY_REFERENCE NVARCHAR(20)
		, CI_AMOUNT Decimal(9,2)
		, TRANSACTIONDATE DATE
		
		
    )


INSERT INTO #TMP_INSIGHT_R
Values (
(2613354, 'REF000000323168', '************8867', -326,'2012-09-17 10:00:25.000'), 
(2613480, 'REF000000167222', '************5114', -202.2, '2012-09-17 10:12:54.000'),
(2613341, 'REF000000593059', '************6214', -83.4,	'2012-09-17 09:55:06.000')

INSERT INTO #TMP_CI_DETAILS
Values (
(4513354, 'REF000000323168', '************8867', -326,'2012-09-17 11:00:25.000'), 
(4513480, 'REF000000167222', '************5114', -202.2, '2012-09-17 10:13:54.000'),
(4513341, 'REF000000593059', '************6214', -83.4,	'2012-09-17 09:56:06.000'),
(4513342, 'REF000000593059', '************6214', -83.4,	'2012-09-17 09:57:06.000'),
(4513343, 'REF000000593059', '************6214', -83.4,	'2012-09-17 09:58:06.000')

Ian
 
with cte as
(
select PAN, POLICY_REFERENCE,CI_AMOUNT, count(Pan) AS CNT
from TMP_CI_DETAILS
group by PAN, POLICY_REFERENCE,CI_AMOUNT
having count(pan)>1
)
Update TMP_CI_DETAILS
Set Duplicated = 1
from cte
join TMP_CI_DETAILS CI
on ci.pan=cte.pan
 
Simi
Thanks for this.
I assume this means I need to add an extra column to TMP_CI_DETAILS called Duplicated.

Ian
 
yes,

then you could run this to remove one of the duplicates

with cte as
(
select PAN, POLICY_REFERENCE,CI_AMOUNT, count(Pan) AS CNT,
min(PolicyCommIdea_PK) as minpk
from TMP_CI_DETAILS
group by PAN, POLICY_REFERENCE,CI_AMOUNT
having count(pan)>1
)
Update TMP_CI_DETAILS
Set Duplicated = 0
from cte
join TMP_CI_DETAILS CI
on ci.PolicyCommIdea_PK=cte.minpk


then process all where duplates = 0

Simi

 
Scratch that... This one will replace both of those with one.

with cte as
(
select PAN, POLICY_REFERENCE,CI_AMOUNT, count(Pan) AS CNT,
min(PolicyCommIdea_PK) as minpk
from TMP_CI_DETAILS
group by PAN, POLICY_REFERENCE,CI_AMOUNT
having count(pan)>1
)
Update TMP_CI_DETAILS
Set Duplicated = 1
from cte
join TMP_CI_DETAILS CI
on ci.pan=cte.pan
and PolicyCommIdea_PK > minpk

Updates Duplicated with 1.

Then just process where Duplicated = 0

Simi

 
Can I do that all in one hit using something like case statement

with cte as
(
select PAN, POLICY_REFERENCE,CI_AMOUNT, count(Pan) AS CNT,
min(PolicyCommIdea_PK) as minpk
from TMP_CI_DETAILS
group by PAN, POLICY_REFERENCE,CI_AMOUNT
having count(pan)>1
)
Update TMP_CI_DETAILS
Set (Case when ci.PolicyCommIdea_PK=cte.minpk then Duplicated = 0 else Duplicated = 1 end)
from cte
join TMP_CI_DETAILS CI
on on ci.pan=cte.pan

Ian
 
you should not have to set it to 0. it should already be 0 or null.

but yes that would work

Simi
 
Of course set the default to 0 when first populated. Obvious really.

Thank you for your help.

Ian
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top