Update or insert 1

micang · Aug 2, 2008

SQL 2005
Managment Studio

Hi All,

I have a 2 tables as such:

Table1
uniqueID,CC,MakeCode,ModelCode,Body,Date,MatchID,CompleteCode (First 6 columns have data, last 2 are null)

Records in Table1, around 45million

Table2
CC,MakeCode,ModelCode,Body,Date,CompleteCode (All columns have data)

Records in Table2, around 3million

What I need to do:
1.)Link by the first 5 fields between the tables, update Table1 - MatchID (with a int - this integer I assign,1 = matched on all 5 fileds), CompleteCode from Table2
2.) Continue doing the above, but each time removing one of the fields, i.e. linking on 4 then 3 etc (and the matchId is 2 then 3 etc), but ignoring the records in Table1 that have a MatchID populated (so only do nulls).

I have done this in my novice SQL way as so;

Code:

Update table1
set    Table1.MatchID = 1, Table1.CompleteCode = table2.CompleteCode
FROM         Table1 INNER JOIN
                      table2 ON Table1.CC = table2.CC AND Table1.MakeCode = table2.MakeCode AND Table1.ModelCode = table2.ModelCode AND 
                      Table1.Body = table2.Body AND Table1.Date = table2.Date
where table1.matchid is null

I have indexed all 5 columns on each table to hopefully speed up the query.

This works, but one of the problems is that some updates have 1.6 million records, so the update query takes rather a long time.

With my limited experience of sql knowledge, "my" logic has led me to this path, but am wondering if there is a much more efficient way of doing this? Even if it means rather inserting the results into a new table, which I have tried and works much quicker, but I am stuck at how do I ignore the ones from table1 that have already been assigned a MatchID?

This is a once off for cleansing.

As usual, I appreciate any info of assistance.

Michael

dGrouse · Aug 3, 2008

Well, given that this is a one-off, what I'd do, after making a backup copy of table1, is the following:

Code:

select
  t1.uniqueID
  ,t1.makecode
  ,t1.modelcode
  ,t1.body
  ,t1.date
  ,1 as matchID
  ,t2.completecode
into
  matches
from
  table1 t1 inner join table2 t2
  on t1.cc = t2.cc
    and t1.makecode = t2.makecode
    and t1.modelcode = t2.modelcode
    and t1.body = t2.body
    and t1.date = t2.date

Then, I'd delete the records I'd already dealt with:

Code:

delete table1 (with tablock)
where uniqueID in
(select uniqueID from matches)

I added a table lock hint, assuming that nobody else will need to be getting to your table1; might speed things up a wee bit. That'll give you fewer records to deal with, in table1, and you won't have to do any filtering.

Next step:

Code:

insert matches
select
  t1.uniqueID
  ,t1.makecode
  ,t1.modelcode
  ,t1.body
  ,t1.date
  ,2 as matchID
  ,t2.completecode
from
  table1 t1 inner join table2 t2
  on t1.cc = t2.cc
    and t1.makecode = t2.makecode
    and t1.modelcode = t2.modelcode
    and t1.body = t2.body

Then, again, delete the records you've now dealt with:

Code:

delete table1 (with tablock)
where uniqueID in
(select uniqueID from matches)

and so on through your remaining matchID values and join conditions. Table1 will get smaller every time, which is good.

I assume that table1.uniqueid is the primary key on table1.

Not sure that would end-up being faster all around, but I think it might be.

Jeff Prenevost
BI Consultant
Ann Arbor, MI

micang · Aug 3, 2008

Jeff,

Thank you very much for your info, amd going to give this a go, defintely sounds like will be a much better solution.

You assumed correct, table1 will not be used or accessed by someone else, and yes, uniqueID is the primary key.

Appreciate your time and info.

Michael

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Update or insert 1

micang

Technical User

dGrouse

MIS

micang

Technical User

Similar threads

Part and Inventory Search

Sponsor