Which query is more efficient?

Silverstone · Apr 26, 2005

Hi all,

I recently submitted the below query, the goal of which was to get copies of duplicate records on my table:
SELECT *
FROM credpol.q7rl_legal_entity a
INNER JOIN
( SELECT prdid , COUNT ( * )
FROM credpol.q7rl_legal_entity
GROUP BY prdid ) AS tmp ( prdid,cnt )
ON a.prdid = tmp.prdid
WHERE tmp.cnt > 1 ;

I recieved a rather stroppy message from our Datawarehouse advising it wasn't efficient and to do the following:
SELECT *
FROM credpol.q7rl_legal_entity
where prdid in (SELECT prdid
FROM credpol.q7rl_legal_entity
group by 1
having count(*) > 1);

Can someone confirm this please? I was under the impression that when one does an "IN (SELECT......)" that the subquery runs for every row in the main table, however my query at the top does it in a derived table which i believed only runs once for the entire query, hence mine would be more efficient.

Naturally i could be wrong on this point hence i wanted to confirm it before moaning to our datawarehouse ;-)

On other question relating to the HAVING clause. I recall learning about it but do not recall the pro's & Con's when compared to WHERE. Can someone please give me a quick run down?

Thanks

Silverstone

BillDHS · Apr 26, 2005

You are referring to a Correlated Subquery when the subquery runs once for each row of the main table. The IN subquery coded above IS NOT a correlated subquery and will create a spool table of the whole recordset. Review the explain and it should show a spool file of the subquery results.

You could also add the HAVING clause to your derived table and reduce returned rows.

dnoeth · Apr 26, 2005

Just EXPLAIN both queries.
I wouldn't expect to see major differences, because the optimizer will do joins in both cases.

But if there're just a view duplicates and you're pre-V2R5 then maybe the optimizer materializes the derived table without pushing the cnt > 1 into it. So moving the cnt into the having clause might help.

Btw,
a correlated subquery would be:

SELECT *
FROM credpol.q7rl_legal_entity a
WHERE EXISTS
( SELECT *
FROM credpol.q7rl_legal_entity a2
WHERE a2.prdid = a.prdid
HAVING COUNT(*) > 1
)
;

The Explain should be similar to the other queries, because the optimizer flattens the query to a join similar to your first query.

But a totally different solution is using OLAP-functions:
SELECT *
FROM credpol.q7rl_legal_entity a
QUALIFY
COUNT(*) OVER (PARTITION BY prdid) > 1
;

Dieter

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Which query is more efficient?

Silverstone

Programmer

BillDHS

Programmer

dnoeth

Instructor

Similar threads

Part and Inventory Search

Sponsor