Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

DISTINCT + Order By together = slow query 1

Status
Not open for further replies.

jpo245

Programmer
Jan 16, 2005
45
0
0
US
Hi everyone,
I have a table that has about 500000 rows.
Ok so there is a ITEM table and a TAG table, and an ITEM-TAG mapping table.
Tags and Items have an M:N relationship.


My Problem right now:
When i do a select item.id, item.name, item.owner, item.datecreated, from the tables, it is possible that I get multiple ITEMID because each item can have more than one tag mapped to it.

As such, in order to get a list of distinct items, my query is like this:

SELECT DISTINCT(item.id), name, owner, datecreated.
from item
LEFT JOIN itemtagmap ON (item.id = itemtagmap.itemid)
WHERE 1
(AND item.owner=???? AND itemtagmap.tagid IN (???,????) )

ORDER BY item.datecreated

The problem is now that because of the distinct, the query is darned slow.
Things speed up dramatically if i drop the distinct requirement. But that will totally screw up my pagination.


I have a multi column index in this order:
item.datecreated, item.owner, item.id,
It works absolutely fine as long as i don't have the DISTINCT.

The reason why i need to left join the ITEM to ITEMTAGMAP in the first place is because i have some code that allows me to append on additional WHERE itemtagmap.tagid = XXX clauses, so there has to be a join.


Would appreciate any help.... Many thanks...
 
The reason DISTINCT is so slow is that it examines each result-set record in its entirety to ensure there are no duplicates. This is obviously unnecessary; all that needs to be examined is item.id.

You could try:
[tt]
SELECT id,name,owner,datecreated
FROM item
WHERE
id IN
(
SELECT DISTINCT item.id
FROM
item
LEFT JOIN itemtagmap ON (item.id = itemtagmap.itemid)
WHERE
1
AND item.owner=????
AND itemtagmap.tagid IN (???,????)
)
ORDER BY datecreated
[/tt]
Presumably you also have an index (or primary key) on item.id?
 
Can I expand a bit on this?

I have a need for a similar query, where i have a customer table and a publication list table.
I need to create an output list that would give me one unique customer name, but the maximum number of copies they request.

More info: Stored in the publication table would be the cust id, the publication and the quantity.

I would be making a mailing list, comprised of various publications, but the list should only include the customer once, and the maximum number of copies he requested for any pub on his list. So...how would i use the above query but to include the max copy count?



 
shamrox:

How about something like:
[tt]
SELECT c.name,MAX(p.quantity)
FROM
customers c JOIN publications p ON c.id=p.custid
GROUP BY c.name
[/tt]
 
Thanks tony for opening my eyes to the world of subqueries.
 
i had the same problem that jpo245, it is nearly solved with your solution, thanks :)

but i have to get some colunms of the table itemtagmap in the same query, but it is too slow .... (it uses temporary table) is there any solution to optimize the query ?


SELECT id,name,owner,datecreated,itemtagmap.info1, itemtagmap.info2
FROM item,itemtagmap
WHERE
id IN
(
SELECT DISTINCT item.id
FROM
item
LEFT JOIN itemtagmap ON (item.id = itemtagmap.itemid)
WHERE
1
AND item.owner=????
AND itemtagmap.tagid IN (???,????)
)
ORDER BY datecreated
 
int19sd:

How is it that you have the same problem, with the same table and field names, as jpo245's?
 
Hi everyone, after tonygroves introduced me to the power of subqueries, perhaps its time for me to give back a little.

int19sd: One thing i realized about our type of queries is that a "WHERE IN (XXXX)" clause is extremely costly when the IN(XXXX) part contains MANY MANY values.

A workaround would be to do a INNER/LEFT/RIGHT JOIN on the subquery instead of doing a WHERE (IN), and then doing a count on the fields in question.

I found this to be orders of magnitude faster than the WHERE IN(XXX) method.

Im not sure if the cause of your query slow down is this, but this is something you might to try.

Props againt to TONYGROVES who introduced me to subqueries.

Prior to this i had the assumption that subqueries were merely alternatives to joins, and never botherd to use them.
Now i realize that they are needed not only for some queries impossible with regular joins, but also to speed up some queries that are really slow with joins.
 
TonyGroves : my tables don't have the same names, that was just for example ...
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top