Is there an alternative to expensive correlated subqueries? 3

bawtry · Sep 18, 2002

Is there an alternative to the correlated subquery. Running the subquery for each row returned by the outer query must be expensive.
Any help greatly appreciated.
Graham Robbins

The following is an example of a correlated subquery:

/*
Ensure that only the most recently added patient contact of each type is current per patient
*/

UPDATE tblAddressPatient
SET tblAddressPatient.Status = 'Obsolete'
WHERE tblAddressPatient.Id <> (
SELECT TOP 1 tblAP.Id
FROM tblAddressPatient AS tblAP
WHERE tblAP.CIN = tblAddressPatient.CIN
AND tblAP.AddressTypeID = tblAddressPatient.AddressTypeID
ORDER BY tblAP.Id DESC
)

rt63 · Sep 18, 2002

You can try this

UPDATE tblAddressPatient
SET tblAddressPatient.Status = 'Obsolete'
FROM (SELECT CIN , AddressTypeID , max(Id)
FROM tblAP GROUP BY CIN , AddressTypeID) AS tblAP
WHERE tblAP.CIN = tblAddressPatient.CIN
AND tblAP.AddressTypeID = tblAddressPatient.AddressTypeID
AND tblAddressPatient.Id <> tblAP.ID

You can further filter the datasource in from clause adding a join with the AddressPatient table to get only relevant records.

RT

tlbroadbent · Sep 18, 2002

Generally, correlated sub-queries are inexoensize if the table is properly indexed. SQL doesn't have to scan the entire table, just the correlated records for the criteria used in the sub-query.

I think the following will be more efficient than the query you posted.

UPDATE tblAddressPatient
SET Status = 'Obsolete'
WHERE Status <> 'Obsolete'
AND EXISTS
(SELECT *
FROM tblAddressPatient AS tblAP
WHERE tblAP.CIN = tblAddressPatient.CIN
AND tblAP.AddressTypeID = tblAddressPatient.AddressTypeID
AND tblAP.Id > tblAddressPatient.Id)

The EXISTS subquery stops processing as soon as the first row matching the condition is found. The added condition - Status <> 'Obsolete' - prevents updating unnecessarily. Terry L. Broadbent - DBA
Computing Links:

http://tlbroadbent.home.attbi.com/prog.htm

faq183-874 contains "Suggestions for Getting Quick and Appropriate Answers" to your questions.

rt63 · Sep 18, 2002

Hi Terry,

Your comment triggered off some thought process and I did a sample query setup on my database as follows

1. Query on lines of what I had written

update cntr_audit_stts
set stts = 'Obsolete'
from (select cntr_cd , max(prcss_tm) as imprt_dt from rplctn_imprt_hstry
where dscrptn like 'End%' group by cntr_cd) as b
where cntr_audit_stts.cntr_cd = b.cntr_cd
and b.imprt_dt >= isnull(lst_ordr_rcpt_dt,'1960/01/01')

2. Query on lines of exists check what you had suggested

update cntr_audit_stts
set stts = 'Obsolete'
where stts <> 'Obsolete'
and exists (select * from rplctn_imprt_hstry b
where b.cntr_cd = cntr_audit_stts.cntr_cd
and dscrptn like 'End%'
and b.prcss_tm >= isnull(cntr_audit_stts.lst_ordr_rcpt_dt,'1960/01/01'))

-cntr_audit_stts has clustered primary on cntr_cd (1113 records)
-rplctn_imprt_hstry has clustered primary on cntr_cd , prcss_tm (40776 records - avg 36 records per center , median 18 )

So you can see there are no indexing issues and the setup is quite similar to what was posted in the original problem. And I discovered the following

1. If the records likely to be affected in the table to be updated are less than 50% of total records, then second query works significantly better.
2. If the records likely to be affected are more than 80%, then first query works better (gets it done in half the time as second if all records are affected)
3. In the remaining cases, the timings are comparable.

I wondered about case 2 results and thought that the reason may be as follows
- In option 1, I will be doing 40776 ops (number of records to be scanned)
- In option 2, I will be seeking the center index (approx. 11 binary seeks - 2^10 = 1024 and 2^11 = 2048) and assuming I am able to stop the exist query in half the time (approx. 9 (median/2) such ops for each center) which means that for each center correlation subquery I will end up doing approx 99 ops and if you take all 1113 centers into consideration it comes to close to 100K ops which is nearly twice as in option 1

So what I could figure out was that the performance will depend on the actual index statistics of the table and the frequency of the query. For example, if the patient records are to be updated every hour, then correlated queries will definitely be better but if it is to be run once a month/quarter, then one can expect most of the records to be affected and the preprocessed scan join will be better.

I would appreciate, if you could tell me whether I am right in my understanding / assumptions or is it that I am missing some important aspect and a correlated query will always be faster than a full table scan query.

Thanks

RT

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Is there an alternative to expensive correlated subqueries? 3

bawtry

Programmer

rt63

IS-IT--Management

tlbroadbent

MIS

rt63

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor