Using Subquery to Get Difference

infomania · Dec 10, 2002

I am trying to select the remaining records after the "Top 100" records have been selected (in a sub query). I have tried NOT EXISTS without success. Here is a short form of my query put into the command box. (using CR 9 & SQL Server 2000)

SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d left outer join customers a
on d.cust_id = a.cust_id
WHERE
d.date>={?Start} and d.date<={?End}
and not exists
/*start subquery*/
(SELECT top 100 cust_id, sum(sales)
FROM detail
WHERE date>={?Start}and date<={?End}
d.cust_id = cust_id
GROUP BY cust_id
ORDER BY sum(sales) desc)
/*end subquery*/

GROUP BY d.cust.id
ORDER BY sum(d.sales) desc

When I run this I get no records returned. I tried the NOT IN syntax without success since I had more than one item in the select statement.

Any, all help will be appreciated.

cbeggan · Dec 10, 2002

Hi,

Two things a) have you got more than 100 unique cust_ids ?

Secondly, try this:

SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d left outer join customers a
on d.cust_id = a.cust_id
WHERE
d.date>={?Start} and d.date<={?End}
and cust_id NOT IN
/*start subquery*/
(SELECT top 100 cust_id -- <select nothing else>
FROM detail
WHERE date>={?Start}and date<={?End}
d.cust_id = cust_id
GROUP BY cust_id
ORDER BY sum(sales) desc)
/*end subquery*/

You don't have to SELECT the result of the aggregate function. I think this should work 4u!

Cheers. ------
Dublin, Ireland.

infomania · Dec 10, 2002

I'll give that a try. And yes, I have more than 100. Actually the count is >5,000.
Thanks... I'll be back with the result. Running these queries is taking 5-10 min.

cbeggan · Dec 10, 2002

Also, I tested this on a DB here without bothering to join the subselect. I don't think there's any advantage in that. [ Someone else might disagree though!! ]

e.g.
select * from invoices
where groupby_field not in
(select top 10 groupby_field
from invoices
group by groupby_field
order by sum(invoice_amount) desc )

...
------
Dublin, Ireland.

tlbroadbent · Dec 10, 2002

--This query uses NOT IN
SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d
LEFT JOIN customers a
ON d.cust_id = a.cust_id

WHERE d.Cust_ID Not In
(SELECT top 100 cust_id
FROM detail
WHERE date>={?Start}and date<={?End}
GROUP BY cust_id
ORDER BY sum(sales) desc)

WHERE d.date>={?Start} and d.date<={?End}
GROUP BY d.cust_id, a.cust_name
ORDER BY sum(d.sales) desc

--Here is another method using a LEFT JOIN
SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d
LEFT JOIN customers a
ON d.cust_id = a.cust_id

LEFT JOIN
(SELECT top 100 cust_id
FROM detail
WHERE date>={?Start}and date<={?End}
GROUP BY cust_id
ORDER BY sum(sales) desc) top100

ON d.cust_id = top100.cust_id

WHERE d.date>={?Start} and d.date<={?End}
AND top100.Cust_ID Is Null

GROUP BY d.cust_id, a.cust_name
ORDER BY sum(d.sales) desc Terry L. Broadbent - DBA
Computing Links:

http://tlbroadbent.home.attbi.com/programmer.htm

faq183-874 contains "Suggestions for Getting Quick and Appropriate Answers" to your questions.

infomania · Dec 10, 2002

Sorry to take so long to get back... I tried the first suggestion and waited, and waited... I aborted, finally. There was no error message. I'll try the other suggestions tonight. This thing is killing me. Terry, I like the LEFT JOIN idea. I'll give that a try next.

Thanks. I'll be back!

tlbroadbent · Dec 10, 2002

The LEFT JOIN should be more efficient. I really should have listed it first as it is my preferred query. Terry L. Broadbent - DBA
Computing Links:

http://tlbroadbent.home.attbi.com/programmer.htm

faq183-874 contains "Suggestions for Getting Quick and Appropriate Answers" to your questions.

infomania · Dec 10, 2002

Okay... trying Terry's Left Join solution, I got CR 9 to accept the SQL now I'm waiting for the result....
SUCCESS!!!

Thank you Terry!

cbeggan · Dec 11, 2002

Just a question for you on this one Terry (if I can post it here!). The SQL background I have is mainly Ingres, with a little Sybase thrown in. I am using SQLServer just these past few weeks. Anyway, question is, why is your LEFT JOIN faster?

I would have thought (from Ingres experience) that the Query Optimiser would essentially rewrite the subselect as a flattened query - and that it would probably end up looking very like the left join. Also, I would have assumed that both queries were resulting in the aggregates being performed twice over the whole table due to the ORDER BY's - so I can't see why it is so much faster !!

Finally though, I assume that the original 'joined subselect' was creating a needless join table which would have slowed it up - particularly on a large table.

As you can guess, I'm just trying to increase my grasp of how the SQLServer optimiser operates. Also, feel free to let me know if this post should be elsewhere. Thanks! ------
Dublin, Ireland.

tlbroadbent · Dec 11, 2002

I'm not very knowledgeable about the inner workings of the query optimizer. I'm often surprised at the query plans created by SQL Server. The optimizer in SQL 2000 is better than in SQL 7, which was very much improved over SQL 6.5. Despite the improvements, the optimizer is imperfect and I find that much of my time as a DBA is spent optimizing queries for the developers.

Query analyzer is an excellent tool for showing query plans and finding bottlenecks in queries. If the reason for a long running query isn't readily apparent by looking at the code, I run the Show Estimated Query Execution Plan process in Query Analyzer. It often reveals that the query plan isn't what was expected and helps determine the changes needed to optimize the query.

In the case of the query we've been discussing here, I would guess that SQL ends up running the sub-query multiple times due to the group by and top 100 clauses. If info mania is willing to experiment, I would guess that the following code would run much faster than the NOT IN sub-query code. Of course, SQL Server may surprise me.

Also, Crystal reports may not accept this code. Code like this should be encapsulated in a SQL stored procedure.

SELECT top 100 cust_id
INTO #temp
FROM detail
WHERE date>={?Start}and date<={?End}
GROUP BY cust_id
ORDER BY sum(sales) desc

Create Unique Clustered Index ix_cust_id on #temp(cust_id)

SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d
LEFT JOIN customers a
ON d.cust_id = a.cust_id
WHERE d.date>={?Start} and d.date<={?End}
AND d.Cust_ID Not In (Select cust_id From #temp)
GROUP BY d.cust_id, a.cust_name
ORDER BY sum(d.sales) desc

Drop table #temp Terry L. Broadbent - DBA
Computing Links:

http://tlbroadbent.home.attbi.com/programmer.htm

faq183-874 contains "Suggestions for Getting Quick and Appropriate Answers" to your questions.

infomania · Dec 11, 2002

I'll give this a try. The LEFT JOIN is working fine but the query takes about 8 minutes to run against a detail table with about 2MM records.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Using Subquery to Get Difference

infomania

Programmer

cbeggan

Programmer

infomania

Programmer

cbeggan

Programmer

tlbroadbent

MIS

infomania

Programmer

tlbroadbent

MIS

infomania

Programmer

cbeggan

Programmer

tlbroadbent

MIS

infomania

Programmer

Similar threads

Part and Inventory Search

Sponsor