Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

OR conditions cannot use indexes? 2

Status
Not open for further replies.

jcaulder

Programmer
Apr 22, 2002
241
0
0
US
Quick and easy question I hope. I have a query that uses something like the following in MS SQL Server 2000:

select t.* from tbl t
where
(t.void = 1 and t.dateStarted = '1/14/2009')
OR
(t.finalized = 1 and t.dateCompleted = '1/14/2009')

There is an index on both dateStarted and dateCompleted. When I run this query, it appears from the execution plan that no index is used and it takes about 33 seconds to run.

When I split this into a UNION making each OR the WHERE clause, it clearly uses the indexes and returns in under 2 seconds.

So question is, should the optimizer be able to split the OR into logical parts and take advantage of the index or is the correct approach to split this into a UNION to force the index? Is there something I can do to force the OR statement to use the relevant indexes for each side of the OR condition?

TIA!

J
 
If it did two separate index lookups, there's a problem because both conditions could apply to the same row, making duplicates. Doing a UNION is an implied SELECT DISTINCT over every column in the table.

If you had an index that covered both dateStarted and dateCompleted, your query might be able to use it.
 
... and duplicates have to be eliminated either through a join or through a DISTINCT. The optimizer guessed that a simple table scan would be better (and in this case was wrong).

You could also change your business rules so that void rows get their dateStarted put in the dateCompleted column, making your query like this:

where
t.dateCompleted = '1/14/2009' -- I encourage '20090104' as this will parse correctly, platform and region independently
and (t.void = 1 or t.finalized = 1)

which will probably use the index.
 
I've discovered that depending upon the date used, the optimizer will actually use the indexes in rare cases and then perform its own "sort distinct" to get rid of duplicates(according to the execution plan). This appears to happen only when there is little or no data matching the date in the WHERE clause.

So apparently the optimizer is making some guess about the number of rows returned by each index seek and then deciding whether the table scan or the sort/duplicate is more efficient?

The UNION always appears to correctly use the index and is far superior than the OR.

Per your other suggestion, dateCompleted actually has a special meaning and should not be populated if it is voided. Voided loads cannot be completed so the business process change would not apply here although I could see this working in other scenarios.

It appears to me, in this case, that the correct approach is a UNION to ensure efficient query plans? Unless there is some reason not to use it?

Why would a combined index on dateStarted, dateCompleted be more likely to be picked here? I am going to create that in our test environment and see what the results are.

Thanks!
 
The optimizer's choices are always highly dependent on the accuracy of the statistics, which give it information not just about number of rows total in the table, but also about the selectivity of various indexes and thus the likely number of rows for any particular conditions and joins as well. Pay attention to the difference between estimated & actual rows; when they are wildly different the optimizer is NOT going to be picking the best execution plans.

Without having the tables in front of me to play with it's hard to say, but I'd experiment with leaving off the t.void and t.finalized parts just to see if that changes things. Perhaps putting a nested query with those parts on the outside could make a difference. It's hard to suggest... I'd try lots of things to try to understand why the indexes are being picked or not.

Also, if your indexes don't include the void and finalized columns, a bookmark lookup has to be done anyway (or some kind of join between the nonclustered index and the clustered index, I've seen both in execution pllans). What those joins look like could make more clear what is going on.

I didn't expect the business rule change to be possible for you, but it was an option... to help in trying to think outside the box.

There's no reason not to use UNION if it's giving superior query performance. I just would want to know WHY.

Have you tried forcing the index usage?

Last, your question about the index is that it is quite a different task to scan a single index for two values to find candidate rows than it is to scan two indexes and combine the candidate rows. I don't really KNOW if this will work, but it could. As I think about it, I doubt it would be better than your UNION query.

The UNION is straightforward as it allows you to do two index seeks, and if the number of rows returned is small, the bookmark lookups or join to the clustered index will be very inexpensive.

 
I've tried to break this down to its most basic components hoping this will lead to some revelation as to why it's not picking the indexes. The table being queried has a little over 1.2 million rows in it.

The following will use the indexes:

SELECT fld1
FROM tbl
WHERE
(dteStarted = '12/17/2008' or dteCompleted = '12/17/2008')

Index1 Estimated Row Count = 2528
Index2 Estimated Row Count = 2549

However, apparently if I add any other condition outside of the OR, the indexes are no longer used as in:

SELECT fld1
FROM tbl
WHERE
bitDeleted = 0
and
(dteStarted = '12/17/2008' or dteCompleted = '12/17/2008')

The result is a clustered index scan with:

Estimated Row Count = 5069

However, if I use the UNION, the index is always used cutting the query time from 30+ seconds to about 2 seconds on average.

SELECT fld1
FROM tbl
WHERE
bitDeleted = 0
and dteStarted = '12/17/2008'
UNION
SELECT fld1
FROM tbl
WHERE
bitDeleted = 0
and dteCompleted = '12/17/2008'

All of the estimated row counts look reasonable and are consistent with the actual row counts in all cases.

Any ideas as to why it would choose to scan the table just because of the additional criteria in the WHERE clause with the OR? Even if the additional field has an index it makes no difference as I've tried it with several different fields and types.

It would seem logical to me that it would take advantage of the index, get the 2500 rows, do the bookmark lookup and then filter them instead of scanning the table.

While the UNION clearly solves the problem, it would not seem this should be required and causes me a little concern about other queries that may be suffering from similar choices by the optimizer. I would like to understand this so I can ensure it is addressed correctly now and going forward.

Thanks!
 
Try an index hint to see if you can force it to use them.

Try a nested query:
SQL:
SELECT
FROM
   tbl t
   inner join (
      SELECT fld1
      FROM tbl
      WHERE
      (dteStarted = '12/17/2008' or dteCompleted = '12/17/2008')
   ) x on t.fld1 = x.fld1
WHERE
   (t.void = 1 and t.dateStarted = '1/14/2009')
   OR (t.finalized = 1 and t.dateCompleted = '1/14/2009')
Without being able to play with it myself I really don't know what else to suggest... sorry.
 
Ok, thanks! Just thought the extra detail might reveal something.

It seems the bottom line is that it really should be using the index and a UNION or subquery should not be necessary. Why it's not is still a mystery. Perhaps at some point if I have time to continue playing with it I'll come across the answer. For now, the UNION is the winner.

Thanks again!

J
 
Did you examine whether the estimated rows matched the actual rows (validating that your statistics are up to date)?
 
Yeah, in my third post in this thread, I mentioned the actual versus estimated row counts. They are definitely in alignment so statistics are up to date and accurate. I could not determine any reason for the behavior and ultimately used the UNION as the solution in production. It is running very efficiently with no issues. Perhaps at some point I will revisit this and find the cause. If so, I'll definitely post my findings.

Thanks again for your help!

 
I see that you did say that! Sorry about that.

I am curious, too, and I wish I could play with your data.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top