Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Cross table insert duplicate key issue 1

Status
Not open for further replies.

craigward

Programmer
Nov 13, 2007
230
GB
Hi,

I have 2 tables, wce_contact and PC_WiredOrder.

I want to copy data from PC_WiredOrder into wce_contact, that is not difficult. The issue comes when the PC_WiredOrder table has duplicates in it, it then conflicts with the unique primary key in the wce_contact table. That is because i'm using the customerid from PC_WiredOrderand for the uniqueid in wce_contact. So the duplicate rows customerid '333' in the example below, can't insert into the uniqueid of wce_contact table it attempts the first instance then the second and conflicts.

Here is the error:

Msg 2627, Level 14, State 1, Line 1
Violation of PRIMARY KEY constraint 'PK__wce_cont__AA522ACA7F60ED59'. Cannot insert duplicate key in object 'dbo.wce_contact'.
The statement has been terminated.

Below is some script to create the tables, insert some sample data and the attempt to insert script. This replicates the issue.

If anyone can help me to get round this I would be grateful, I have tried loads of options and can't get it! Thanks for your help

Code:
CREATE DATABASE temp3

GO

USE temp3

CREATE TABLE wce_contact
(
uniqueid VARCHAR(50) NOT NULL PRIMARY KEY,
emailaddress VARCHAR(150) NOT NULL,
)
 
INSERT INTO wce_contact VALUES (1, '1@1.com'), (2, '2@2.com');

CREATE TABLE PC_WiredOrder
(
id VARCHAR(50) NOT NULL PRIMARY KEY,
customerid VARCHAR(150) NOT NULL,
billingemail VARCHAR(150)  NULL,
)
 
INSERT INTO PC_WiredOrder VALUES ('a', 333, '3@3.com'), ('b', 333, '3@3.com'), ('c', 444, '4@4.com');

Script causing the issue.

Code:
INSERT INTO wce_contact
  (UNIQUEID, EMAILADDRESS)
SELECT t1.customerid,
       t1.billingemail
  FROM PC_WiredOrder t1
 WHERE NOT EXISTS(SELECT EMAILADDRESS
                    FROM wce_contact t2
                   WHERE t2.EMAILADDRESS = t1.BillingEmail)
 
Code:
INSERT INTO wce_contact
  (UNIQUEID, EMAILADDRESS)
SELECT distinct
       t1.customerid
     , t1.billingemail
  FROM PC_WiredOrder t1
 WHERE NOT EXISTS(SELECT EMAILADDRESS
                    FROM wce_contact t2
                   WHERE t2.EMAILADDRESS = t1.BillingEmail)

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Thank you fredericofonseca, does Distinct not look at all columns in the row to make sure each value is distinct? I need to add another 10 columns into the query that are not sample code.

Thanks again.
 
It's the continuation of thread183-1790851, right?

Well, as said you should group by customerid. In this example using a numeric id for a varchar(50) field is questionable, you don't get far when relying on implicit type conversions.

You can stay with MERGE, your Source just has to be a GROUP BY query.

Code:
MERGE INTO wce_contact AS Target
using (select customerid, min(field2), min(field3),... from pc_wiredorder [highlight #FCE94F]group by customerid[/highlight] where addedtowired = '0') AS Source ON Target.emailaddress = Source.BillingEmail
WHEN NOT MATCHED THEN
    INSERT ([uniqueid], [createuser], [edituser], [recordmanager], [createtime], [edittime], [COMPANY], [contact], [FirstName], [lastname], [emailaddress], [Phone], [Address1], [Address2], [Address3], [City], 
    [County], [Postalcode], [Customer_Type], [customerpipeline], [account_manager], [Referred_By], record_type, OP_Gone_Ahead) 
    VALUES (source.[customerid], 'ykn3t293i7ckyat7', 'ykn3t293i7ckyat7', 'ykn3t293i7ckyat7', getdate(), getdate(), source.[FirstName]+' '+ source.[Surname], source.[FirstName]+ ' ' + source.[Surname], source.[FirstName], source.[Surname], source.[BillingEmail], source.[Phone], 
    source.[AddressLine1], source.[AddressLine2], source.[AddressLine3], source.[TownCity], source.[County], source.[Postcode], 
    source.[CustomerType], source.[CustomerPipeline], source.[AccountManager], 'Web Integration', 'Individual', 'Confirmed');

The usage of MIN() for all other fields aside of customerid should be ok, as all those fields should have the same value for the same customerid. GROUP BY still needs an aggregation of all fields aside of customerid. MAX() would work too, and stating all fields from a customer would also work and not create smaller groups. If you list all of them DISTINCT would also work, just not with *. Anytime you use * outside of simple single table you must ask yourself, if that's a sane choice, here it's not, as you don't use all fields.

All this points out your database accepts orders with customerid that don't exist. The normal order must be creating customers first and then orders. So I assume you DO have these customer records too.

So pull the data from there:
Code:
MERGE INTO wce_contact AS Target
using ([highlight #FCE94F]select * from customers where customerid in (Select distinct customerid from pc_wiredorder where addedtowired = '0')[/highlight]) AS Source ON Target.emailaddress = Source.BillingEmail
WHEN NOT MATCHED THEN
    INSERT ([uniqueid], [createuser], [edituser], [recordmanager], [createtime], [edittime], [COMPANY], [contact], [FirstName], [lastname], [emailaddress], [Phone], [Address1], [Address2], [Address3], [City], 
    [County], [Postalcode], [Customer_Type], [customerpipeline], [account_manager], [Referred_By], record_type, OP_Gone_Ahead) 
    VALUES (source.[customerid], 'ykn3t293i7ckyat7', 'ykn3t293i7ckyat7', 'ykn3t293i7ckyat7', getdate(), getdate(), source.[FirstName]+' '+ source.[Surname], source.[FirstName]+ ' ' + source.[Surname], source.[FirstName], source.[Surname], source.[BillingEmail], source.[Phone], 
    source.[AddressLine1], source.[AddressLine2], source.[AddressLine3], source.[TownCity], source.[County], source.[Postcode], 
    source.[CustomerType], source.[CustomerPipeline], source.[AccountManager], 'Web Integration', 'Individual', 'Confirmed');

Of course, that relies on all the fields you need for the merge target to be in a customers table.

Overall it's questionable you have both a foreign key and copy over data. I can guess this is because customer data can change over time and you want the data as it was when ordering. There are other ways to do that and still only needing the customerid and datetime of order to lookup the customerdata as it was. Then storing customers in a temporal table allowing queries like SELECT * FROM customers where customerid=X [highlight #FCE94F]AS OF[/highlight] orderdatetime.

Temporal tables were introduced in 2016 and are ideal to prevent such redundant copying and spreading of record information across all database just for that historic approach, the history of data is kept at the single place it happens, your customer table extends with updates and keeps the older state, too, you can easily select just the current states/records a normal table would have but also can get older states, AS OF is just on possibility.

Even if I guess wrong about this in detail, the fact you use a foreign key AND detail data shows me, you need some more training and consultation to fix design flaws of your database.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Thank you Olaf, that was a great explanation. I understand all you said there it makes good sense. The numeric id for a varchar(50) was just in the test table in the live system this field in numeric, not sure why I didn't correct this in the test example above.


I will run some tests and this has really freshened my knowledge it's been a while since I worked with advanced SQL.


Thanks again.
 
Could I ask, do you rate this method of selecting the distinct records? I was playing with it but will use the method using the merge but curious to get your opinion. Thanks.


Code:
  SELECT customerid, billingemail, rk
  FROM (
     SELECT
       customerid, kk.billingemail,
          ROW_NUMBER() OVER (PARTITION BY kk.billingemail
                        ORDER BY kk.customerid DESC) AS rk
     FROM PC_WiredOrder kk
     WHERE  kk.billingemail IS NOT NULL) t
  WHERE rk = 1
 
Thank you fredericofonseca, does Distinct not look at all columns in the row to make sure each value is distinct? I need to add another 10 columns into the query that are not sample code.

Thanks again.
you got a solution for the question you posted - next time put the full code, not just a snippet.

group by or row_number method are the other options - row_number normally faster

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Indeed picking out the first row after ROW_NUMBERing will help to cope without all the MIN()/MAX() or other aggregations. With just a few field I'd be pragmatic and use DISTINCT, but I also can see why picking the rk=1 record could be fastest, there is only the need to find a first row. Since you have multiple rows per customerid, thar will just not really give an order making sense, i.e. the record picked per kk.customerid is random. I don't know if Fredericos assertion this is faster would hold true.

I'd still prefer to only take the customerid out and join in the data you need via this ID, especially if all records of PC_WiredOrder kk with same customerid have the same values in billingemail, the SQL engine can't knnow this and will need to work out MIN()/MAX() distinct or any row numbering. It's easier to pick out the distinct customer ids and join data from where this is the primary key. You have no way to tell SQL Server to pick any row of PC_WiredOrder and are sure any record with same customerid also has same billingemail, so it doesn't help that you only query from one source table and avoid a join.

Indeed PC_WiredOrder looks like joined data and you should hook into whatever process generates it and merge new mailadresses from there instead of doing this as aftermath. You see, you first cause multiple rows in PC_WiredOrder and now want to condense them, you have a stage in the process where this already was condensed, was only single rows per customerid.

Bye, Olaf.

Olaf Doschke Software Engineering
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top