Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Using Subquery to Get Difference

Status
Not open for further replies.

infomania

Programmer
Oct 27, 2002
148
0
0
I am trying to select the remaining records after the "Top 100" records have been selected (in a sub query). I have tried NOT EXISTS without success. Here is a short form of my query put into the command box. (using CR 9 & SQL Server 2000)

SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d left outer join customers a
on d.cust_id = a.cust_id
WHERE
d.date>={?Start} and d.date<={?End}
and not exists
/*start subquery*/
(SELECT top 100 cust_id, sum(sales)
FROM detail
WHERE date>={?Start}and date<={?End}
d.cust_id = cust_id
GROUP BY cust_id
ORDER BY sum(sales) desc)
/*end subquery*/

GROUP BY d.cust.id
ORDER BY sum(d.sales) desc

When I run this I get no records returned. I tried the NOT IN syntax without success since I had more than one item in the select statement.

Any, all help will be appreciated.
 
Hi,

Two things a) have you got more than 100 unique cust_ids ?

Secondly, try this:

SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d left outer join customers a
on d.cust_id = a.cust_id
WHERE
d.date>={?Start} and d.date<={?End}
and cust_id NOT IN
/*start subquery*/
(SELECT top 100 cust_id -- <select nothing else>
FROM detail
WHERE date>={?Start}and date<={?End}
d.cust_id = cust_id
GROUP BY cust_id
ORDER BY sum(sales) desc)
/*end subquery*/

You don't have to SELECT the result of the aggregate function. I think this should work 4u!

Cheers. ------
Dublin, Ireland.

 
I'll give that a try. And yes, I have more than 100. Actually the count is >5,000.
Thanks... I'll be back with the result. Running these queries is taking 5-10 min.
 
Also, I tested this on a DB here without bothering to join the subselect. I don't think there's any advantage in that. [ Someone else might disagree though!! ]

e.g.
select * from invoices
where groupby_field not in
(select top 10 groupby_field
from invoices
group by groupby_field
order by sum(invoice_amount) desc )

...
------
Dublin, Ireland.

 
--This query uses NOT IN
SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d
LEFT JOIN customers a
ON d.cust_id = a.cust_id

WHERE d.Cust_ID Not In
(SELECT top 100 cust_id
FROM detail
WHERE date>={?Start}and date<={?End}
GROUP BY cust_id
ORDER BY sum(sales) desc)

WHERE d.date>={?Start} and d.date<={?End}
GROUP BY d.cust_id, a.cust_name
ORDER BY sum(d.sales) desc

--Here is another method using a LEFT JOIN
SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d
LEFT JOIN customers a
ON d.cust_id = a.cust_id

LEFT JOIN
(SELECT top 100 cust_id
FROM detail
WHERE date>={?Start}and date<={?End}
GROUP BY cust_id
ORDER BY sum(sales) desc) top100

ON d.cust_id = top100.cust_id

WHERE d.date>={?Start} and d.date<={?End}
AND top100.Cust_ID Is Null

GROUP BY d.cust_id, a.cust_name
ORDER BY sum(d.sales) desc Terry L. Broadbent - DBA
Computing Links:
faq183-874 contains &quot;Suggestions for Getting Quick and Appropriate Answers&quot; to your questions.
 
Sorry to take so long to get back... I tried the first suggestion and waited, and waited... I aborted, finally. There was no error message. I'll try the other suggestions tonight. This thing is killing me. Terry, I like the LEFT JOIN idea. I'll give that a try next.

Thanks. I'll be back!
 
Okay... trying Terry's Left Join solution, I got CR 9 to accept the SQL now I'm waiting for the result....
SUCCESS!!!

Thank you Terry!
 
Just a question for you on this one Terry (if I can post it here!). The SQL background I have is mainly Ingres, with a little Sybase thrown in. I am using SQLServer just these past few weeks. Anyway, question is, why is your LEFT JOIN faster?

I would have thought (from Ingres experience) that the Query Optimiser would essentially rewrite the subselect as a flattened query - and that it would probably end up looking very like the left join. Also, I would have assumed that both queries were resulting in the aggregates being performed twice over the whole table due to the ORDER BY's - so I can't see why it is so much faster !!

Finally though, I assume that the original 'joined subselect' was creating a needless join table which would have slowed it up - particularly on a large table.

As you can guess, I'm just trying to increase my grasp of how the SQLServer optimiser operates. Also, feel free to let me know if this post should be elsewhere. Thanks! ------
Dublin, Ireland.

 
I'm not very knowledgeable about the inner workings of the query optimizer. I'm often surprised at the query plans created by SQL Server. The optimizer in SQL 2000 is better than in SQL 7, which was very much improved over SQL 6.5. Despite the improvements, the optimizer is imperfect and I find that much of my time as a DBA is spent optimizing queries for the developers.

Query analyzer is an excellent tool for showing query plans and finding bottlenecks in queries. If the reason for a long running query isn't readily apparent by looking at the code, I run the Show Estimated Query Execution Plan process in Query Analyzer. It often reveals that the query plan isn't what was expected and helps determine the changes needed to optimize the query.

In the case of the query we've been discussing here, I would guess that SQL ends up running the sub-query multiple times due to the group by and top 100 clauses. If info mania is willing to experiment, I would guess that the following code would run much faster than the NOT IN sub-query code. Of course, SQL Server may surprise me.

Also, Crystal reports may not accept this code. Code like this should be encapsulated in a SQL stored procedure.

SELECT top 100 cust_id
INTO #temp
FROM detail
WHERE date>={?Start}and date<={?End}
GROUP BY cust_id
ORDER BY sum(sales) desc

Create Unique Clustered Index ix_cust_id on #temp(cust_id)

SELECT d.cust_id, a.cust_name, sum(d.sales)
FROM detail d
LEFT JOIN customers a
ON d.cust_id = a.cust_id
WHERE d.date>={?Start} and d.date<={?End}
AND d.Cust_ID Not In (Select cust_id From #temp)
GROUP BY d.cust_id, a.cust_name
ORDER BY sum(d.sales) desc

Drop table #temp Terry L. Broadbent - DBA
Computing Links:
faq183-874 contains &quot;Suggestions for Getting Quick and Appropriate Answers&quot; to your questions.
 
I'll give this a try. The LEFT JOIN is working fine but the query takes about 8 minutes to run against a detail table with about 2MM records.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top