I have a table with 62,037,506 million rows in it, call it BigTable. I am investigating a column, call it SequenceNumber. I just want to see some typical sequences. So I run a query:
This takes forever and I finally kill it. Weird, I think. Look at the indexes on this table:
[tt]BigTableAI01 - nonclustered - CustomerID
BigTableAI02 - nonclustered - ParentID, Active
BigTableAI03 - nonclustered - ValidValueID, Active, ParentID
BigTableCI - clustered - DataType, ParentID, ValidValueID, SequenceNumber
BigTablePK - nonclustered, unique, primary key - BigTableID[/tt]
What's the problem? Read the 02 index to get the top rows ordered by ParentID (how the sequencenumbers correlate), do a bookmark lookup to get the rows, and you're done.
But the estimated execution plan for the query looks like this:
[tt] |--Top(100)
|--Parallelism(Gather Streams, ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Sort(TOP 100, ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Clustered Index Scan(OBJECT[REPO].[dbo].[BigTable].[BigTableCI]))[/tt]
Scanning all 62 million rows of the clustered index. Yuck.
Okay, so I try this:
[tt] |--Sort(ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Top(100)
|--Bookmark Lookup(BOOKMARK[Bmk1000]), OBJECT[REPO].[dbo].[BigTable]) WITH PREFETCH)
|--Index Scan(OBJECT[REPO].[dbo].[BigTable].[BigTableAI02]), ORDERED FORWARD)[/tt]
Ah... 100 rows on that index scan. That's what I was expecting in the first place. I know the last ParentID may not have all its associated seqnums in order, but who cares. That seems a very minor detail that the engine could correct. Something like, get 100 rows plus any more that share the same ParentID of the last row, then sort with SequenceNumber, and throw away the extras.
So I thought I'd try to force the engine to use that index I want:
[tt] |--Top(100)
|--Parallelism(Gather Streams, ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Sort(TOP 100, ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Bookmark Lookup(BOOKMARK[Bmk1000]), OBJECT[REPO].[dbo].[BigTable]) WITH PREFETCH)
|--Index Scan(OBJECT[REPO].[dbo].[BigTable].[BigTableAI02]))
[/tt]
But that index scan is reading all 62 million rows, not just the 100 of the previous query!
Can anyone explain this? And can you give me a query that performs like the derived table one, but doesn't require a derived table?
Code:
select top 100 * from BigTable order by ParentID, SequenceNumber
[tt]BigTableAI01 - nonclustered - CustomerID
BigTableAI02 - nonclustered - ParentID, Active
BigTableAI03 - nonclustered - ValidValueID, Active, ParentID
BigTableCI - clustered - DataType, ParentID, ValidValueID, SequenceNumber
BigTablePK - nonclustered, unique, primary key - BigTableID[/tt]
What's the problem? Read the 02 index to get the top rows ordered by ParentID (how the sequencenumbers correlate), do a bookmark lookup to get the rows, and you're done.
But the estimated execution plan for the query looks like this:
[tt] |--Top(100)
|--Parallelism(Gather Streams, ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Sort(TOP 100, ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Clustered Index Scan(OBJECT[REPO].[dbo].[BigTable].[BigTableCI]))[/tt]
Scanning all 62 million rows of the clustered index. Yuck.
Okay, so I try this:
Code:
select SequenceNumber, * from (
select top 100 * from BigTable order by parentID
) x order by parentID, SequenceNumber
|--Top(100)
|--Bookmark Lookup(BOOKMARK[Bmk1000]), OBJECT[REPO].[dbo].[BigTable]) WITH PREFETCH)
|--Index Scan(OBJECT[REPO].[dbo].[BigTable].[BigTableAI02]), ORDERED FORWARD)[/tt]
Ah... 100 rows on that index scan. That's what I was expecting in the first place. I know the last ParentID may not have all its associated seqnums in order, but who cares. That seems a very minor detail that the engine could correct. Something like, get 100 rows plus any more that share the same ParentID of the last row, then sort with SequenceNumber, and throw away the extras.
So I thought I'd try to force the engine to use that index I want:
Code:
select top 100 * from BigTable WITH (INDEX (BigTableAI02))
order by parentID, SequenceNumber
|--Parallelism(Gather Streams, ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Sort(TOP 100, ORDER BY[BigTable].[ParentID] ASC, [BigTable].[SequenceNumber] ASC))
|--Bookmark Lookup(BOOKMARK[Bmk1000]), OBJECT[REPO].[dbo].[BigTable]) WITH PREFETCH)
|--Index Scan(OBJECT[REPO].[dbo].[BigTable].[BigTableAI02]))
[/tt]
But that index scan is reading all 62 million rows, not just the 100 of the previous query!
Can anyone explain this? And can you give me a query that performs like the derived table one, but doesn't require a derived table?