Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Duplicate record lookup fails

Status
Not open for further replies.

cellgfx

Programmer
Aug 13, 2004
5
0
0
GB
I have a Transform Data Task that copies a lot of data from my source system. Unfortunately, I cannot use a DISTINCT in the SQL from the source system, due to a very poor ODBC driver!
So, when I am creating my primary key, I am trying to do a lookup on the PK column before I insert the record to see if it exists. If it does, then I skip the row. The lookup references the target database of the task.

The problem I have is that the lookup doesn't find any duplicates loaded from the database. It allows them through and causes the database to throw a primary key error.

Has anyone experienced this, or think they know what I'm doing wrong?
 
If you want to sort out duplicates and your original datasource doesn't support select distinct, why don't you make this a multi step job with work flows..

Step 1 - bring your data accross to SQL - to a table that is clean and allows duplicates..
Step 2 - Then on Successfull Completion have the workflow either create an exceptions report or do the insert with a distinct from the stageing table..


To get a list of duplicate key values use the "Having" clause perfrom a count on the key cols..

i.e. if you had a table called t1 with a key col called c1 you could find duplciates with the following query..
Code:
select c1 from t1 group by c1 having count(c1) > 1
 
I had thought of that, but the problem I have is that this task has been written so that any number of them can be run in parallel.

That would mean using my unique process identifier to mark the incoming records in the staging table and then do a distinct select, all of which seems like a lot of hassle because the SQL Server lookup doesn't work properly!

Still, unless anyone else has an alternative suggestion, this might have to be the route I take...
 
The code is :

SELECT field1
FROM [Table Name]
WHERE tableid = ?

I then pass the tableid into the lookup to see whether it returns the field I chose. Apologies for putting dummy names, but some of the structure could be company-sensitive!!
 
I have solved the problem by adding an IGNORE_DUP_KEY UNIQUE index to my primary key field. This means that I can use Fast Load, without having to check the field first!

However, any duplicates then set the TDT to a failed status, something that I don't want. How can I check for a specific error in the TDT?
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top