Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

No index used for resultset of 819 rows from 4mil+ records?

Status
Not open for further replies.

Thanasus

Programmer
Nov 25, 2003
13
US
Hello everyone,

I am having a strange problem I cannot figure out...

I have a table callled tracking_clickstreams:
id - unsigned bigint primary key unique
fkey_session - unsigned mediumint
fkey_url - unsigned mediumint
stamper - timestamp

The table has an index on the id field, and a compound index on stamper + fkey_session

I am trying to get all ids that fall within a particular date range and have a particular fkey_session value. My querey is:

SELECT count(c.id) FROM tracking_clickstream AS c WHERE c.stamper>=20031001000000 && c.stamper<=20031112235959 && c.fkey_session=37;

This query results in 5009 rows selected from a possible 4.2 million. The problem is that no index is used! When I perform an explain I see the index under &quot;possible keys&quot;, however no key is used and the entire table is instead scanned. This results in this query taking a while.

I have tried to force the index via &quot;USE INDEX&quot; to no avail. The query returns only a small portion of the dataset so I don't see why it is not using the stamper+fkey_session index.

When I modify the query to be a smaller date range, however, the proper index is used.

The problem though, is the larger date range only kicks back 5009 out of 4.2 million rows, which is .10% of the data? Why would it not use an index?
 
I guess the explanation is that the optimiser decides not to use an index this could be becuase of the way you have written your where

try between it is more efficient than 2 where's

SELECT
count(*)
FROM tracking_clickstream
WHERE stamper between 20031001000000 and 20031112235959 and fkey_session=37;

If fkey_session has less unique values than stamper and can restrict the query faster then this should be re-written as

SELECT
count(*)
FROM tracking_clickstream
WHERE fkey_session=37 and stamper between 20031001000000 and 20031112235959;

The optimiser probably read your query where statement
WHERE c.stamper>=20031001000000
as damn it ! that is most of the records so I rekon i'll just scan the table sequentially as the physical size of the table is small in relation to the physical size of the index.
 
hvass,

Thank you! The fkey_session does have less unique values so by moving it to the beginning of the where clause and redoing my compound index to be fkey_session+stamper instead of stamper+fkey_session it is now using the index and is fast.

Now though I am concerned that my knowledge of how MySQL handles queries with indexes may be flawed. Since the fkey_session has less unique values (about 50) compared to the timestamp field the index is now properly working. I would think it would be the exact opposite. I would think that the mysql would prefer to see fields which have more distinct values first since they let it quickly eliminate non-relevant rows.
 
That I think is the difference between a
=
and a
between

Going to find =37 is a quick look up on the index with 50 unique values of the key then it can get on with the rest of the where statement

If it does a between first it needs to find where the index value for
20031001000000
20031001000001
20031001000002
20031001000003
then go and check the =37

If your query was
fkey_session=37 and stamper=20031001000000

it would be better off the other way around

Most of the speed improvement will have come from using between rather than >= and <=
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top