Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Select of 1 million rows, how long?

Status
Not open for further replies.

kristofdielis

Programmer
Sep 1, 2011
25
How long do you feel should a single table select of 1 million rows take? A simple SELECT * FROM 1M_ROWS_TABLE, on a SQL Server 2008 R2, straight within the Management Studio.

1, 2, 5, maybe 10 seconds? What is acceptable in your opinion?
 
How long would it take drink a glass of water? Depends on the size of the glass.

Sorry to be facetious.

For instance - whats the size of the rows?

As you can imagine if a table has one column that is is a char 1 - it wont take as long to bring back a million rows as if its got 100 fields of different types\sizes. Also i am guessing that if stuff is stored in a filestream then it will also be slower.

Also are there any computed cols within the table - that will slow it down.

Basically - its too vague.

Sorry

Dan

----------------------------------------

Be who you are and say what you feel, because those who mind don't matter and those who matter don't mind - Dr. Seuss

Computer Science is no more about computers than astronomy is about telescopes - EW Dijkstra
----------------------------------------
 
> Basically - its too vague.

Exactly, that's the point. A query may run 30 seconds, one person may find that acceptable, I may find that way to slow.

Any measurement of elapsed time is subjective by default, unless you manage to reach optimal performance every single time, and that is rarely the case.
 
Sorry - i dont think i was clear.

If someone (with more knowldge then i - i hasten to add) had the information you have(table definition) then they could provide a good guess (not taking into account hardware and set up etc) an estimate. Without the information there can be no estimate - not just a subjective estimate, but no estimate.

So with my analogy of the drink. If we have a definition of the size of the glass - an estimate can be given. Yes its subjective because we dont know who is drinking it (in the case of the query - the hardware its running on) but a good estimate can be given to what i would expect.

So overall - if i had simple select that took 30 seconds, Would i think that was "way too slow". Well no - because it depends on the data being retrieved. What i am saying is you have much more information to go on (even gut instinct) then you have given here.

Dan

----------------------------------------

Be who you are and say what you feel, because those who mind don't matter and those who matter don't mind - Dr. Seuss

Computer Science is no more about computers than astronomy is about telescopes - EW Dijkstra
----------------------------------------
 
Fair enough. Since this is hypothetical, let's assume a simple employee table:

Code:
employee_id	bigint
firstname	nvarchar(50)
lastname	nvarchar(50)
function	nvarchar(200)
salary		float
hire_date	datetime

Employee_id is the primary key, the table is totally stand-alone, there are no foreign keys or indexes, other than the PK.

And, this table happens to contain exactly 1 million rows.

So, how long should a SELECT * FROM EMPLOYEE take?
 
Just doing some testing.

I wrote a simple select on one of our tables with a similar amount of rows, and i trimmed the columns down slightly.

However 800,000 records took about a minute.

I personally would say 30 seconds was quite good!

----------------------------------------

Be who you are and say what you feel, because those who mind don't matter and those who matter don't mind - Dr. Seuss

Computer Science is no more about computers than astronomy is about telescopes - EW Dijkstra
----------------------------------------
 
And then it still depends on the speed of the disksystem, the CPU's and memory available, and the edition of SQL Server (Express or Datacenter Edition). And the direction of the wind, if you'd like that added too. :p
 
Yes exactly. But regardless, you should have a gut feeling of about how long you want your queries to take.

Personally, I think that a 1M select should at the most take 10 seconds. And I typically would expect faster.
 
There's a couple of things at play here.

1. Does SQL Server have your data cached in to memory? Disk access is slow. If your data is already loaded in to memory, then it will be many times faster to "get" the data. By this, I mean that SQL will not have to go to the disk to get it. Without disk activity, this process will be faster.

2. Network latency and bandwidth: Looks like your row is approximately 350 bytes. This represents the maximum/defined size of a row, not the actual size which depends on the amount of data stored in this varchars. Using a guestimate of 200 bytes per row, with 1 M rows, that would be approximately 200 megs going across the network. There's overhead involved so the actual amount may be more than that. Assuming you have a 100 megabit / second network (which is common), 200 megabytes would be approximately 1600 megabits. At 100 megabits per second, you're looking at 16 seconds (optimally) to send the data through the network. This assumes that the network isn't being used for anything else at the time. This time could easily double.

3. Doing something with the data: Once the client application gets the data, it needs to do something with it. Since you mentioned using SQL Server Management Studio to display the results, you MUST consider the amount of time it takes to draw the grid of data. This is no small task and would represent some real time involved. Of course, this also depends on how the application draws the data. If you only draw what is visible on the screen, then this would be very fast because you probably won't see more than a dozen or so rows. This assumes that you have a relatively smart front end application that can save the data in a recordset or arrays and only draw the visible rows and the user scrolls through them. I suspect that SQL Server Management Studio does not cache the data in the application and probably draws all the data to the grid as it receives the data.

Given all this, I would say 30 seconds is reasonable. Anything less than that and you got lucky. More time would not surprise me either. We're talking about a relatively large amount of data here. There are things you can do to speed things up if it's important to do so. You could make sure that your data is always cached in to memory on the SQL Server. To do this, make sure there is plenty of RAM available to the SQL machine so that your data is (probably) always cached in memory. You could also use a gigabit network. Gigabit cards are pretty cheap these days. Assuming your cabling can already accommodate these speeds, this is probably your "biggest bang for the buck". By making the network 10 x faster, you can reduce your 16 seconds to 1.6 seconds (approximately). You could also make sure you have an excellent graphics card in your client computer. Often times draw speed depends on the graphics card. Throwing some hardware at the situation may allow you to get this to be 5x faster (or more).

All things considered, this is an interesting theoretical discussion, but in my opinion, it has limited practical applications because it is rarely necessary to send that much data from the DB to the application. You certainly wouldn't expect the end user to scroll through and look at each row. It's too much data. It could be that the application ultimately save the data to a local disk as an export, but there are better and faster methods for this too. Alternatively, the front end application could be processing the data some how. Depending the processing involved, it may be faster/better to do some of that processing in the SQL so that you only send the results across the wire instead.

I've done a lot of rambling here. Hopefully it makes sense.

-George
Microsoft SQL Server MVP
My Blogs
SQLCop
twitter
"The great things about standards is that there are so many to choose from." - Fortune Cookie Wisdom
 
Hi George,

Thanks for the feedback. I am working on a new app, that involves building things from scratch, although it heavily depends on a similar older application. Usage is Business Objects reports, which depend on views which I found to be (very) slow, on the old system.

These forms need a lot of data, so they are expected to not display immediatelly, I just want to avoid any additional, unnecesssary overhead.

I just want to plan ahead and pinpoint a target expected speed, hence the question.
 
Here is something else to consider....Locks. When someone else is accessing the data, it might cause a lock on the rows. Then the select statement has to wait for the rows to be released. One solution is to use WITH (NOLOCK) on select statements.

Question to ask: can the select query return a 'dirty read'? That means, is it okay to return data that another user might be deleting or updating at the time? If so, use the NOLOCK hint. If you can't have 'dirty reads', then you have to accept locks on the table will cause the select to run slow.

-SQLBill

The following is part of my signature block and is only intended to be informational.
Posting advice: FAQ481-4875
 
ouch.. NOLOCK -- asking for data to be missed or doubled.

kristofdielis - how fast your data is going to return to you depends on the programming tool you are using and other things.
For example on .net if using a datareader this can be 30 times faster than using a dataset.

Regards

Frederico Fonseca
SysSoft Integrated Ltd

FAQ219-2884
FAQ181-2886
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top