Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Performance of a SQLServer2005 1 terrabyte Size DB wth 12 Billion recs 1

Status
Not open for further replies.

calvinsinger

Programmer
Oct 21, 2002
22
US
I am planning to set up a SQL Server 2005 database, but do not know if it will have good read/write performance or (some other problems) or if the hardware to support such a large DB will be very expensive.

I expect the database to be 1 terrabyte in size, with about 12 billion records in each table. Is that too much for SQL Server or should I use Oracle ? What kind of performance can i expect ?
Any articles you can point me to will be appreciated.

Thank you
Calvin
 
That database will fit just fine of a SQL Server.

No matter the platform your hardware configuration will be very expensive. You will need very high end disk arrays, most likley SANs will need to be used.

Your indexing and statistics will be extremly important. As will your isolation of indexes from the data, as well as partitioning the data in the large tables across several disk arrays.

Your read/write performance will depend on the amount of IO that your disk arrays can handle, and the amount of transactions that they need to handle.

If this system going to be a data warehouse, or an OLTP application? What kind of queries will be run against the large tables? If queries will need to be summerized against the large tables often then summary tables should be used.

The database design, and hardware design will be extremely important for this project. When dealing with large databases like this small errors before major performace issues very quickly which are very hard to overcome after the fact.

i'd recommend that you check out this article for disk configuration before starting.

Here is another on temp tables which may come in handy
Also check out some of the FAQs over in forum962. Some of them may come in handy as well.

If the hardware is configured correctly, and your database is confiugred correctly then your database should perform nicly.

There are databases out there which are larger than 1TB and have tables of that size and larger.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
Thank you very much mrdenny for the quick & educational response.

I have some further questions

You say Hardware is going to be expensive. Expensive is a relative term, so maybe we can describe what
it will take.

Will the system below be sufficient, you think ? Or are we talking something much larger..

System Description:

System: Quad Core, dual CPU system, with 6 GB memory, 8 146 GB Drives internal to the system. For example a
DL380G5 provided by HP.

DISK Configuration:
1. OS+Applications on RAID 1 ( 2 drives mirror each other, 146 GB SAS drive. Internal to the system)
2. Data on RAID 10 (14 146 GB SAS Drives on MSA 30 External Drive Cage. 2 Mirrors with 7 drives each)
3. Transaction Logs on RAID 10 (6 146 GB SAS Drives internal to System. 2 Mirrors with 3 drives each)

You said a SAN may be needed. how to determine if SAN is needed ? Is it needed when the system internal
disk space is not enough to meet the data size + transaction log size + os+ application size ?

So if you think this is large data base what is considered small size db & what is a mid size db
(approx. range) ?

What difference will Datawarehouse vs. OLTP make ?

Thanks again.

Calvin
 
An 8 core system will probably be enough CPUs. Don't skimp on the RAM. I would max out the box, a DL 380 holds 32 Gigs of RAM. Maxing out the machine is kind of pricey because of the 2 Gig DIMMs, so you may want to get 16 Gigs so that you can use the 1 Gig DIMMs.

Odds are that a single RAID array isn't going to be enough. You'll want at least 2 file groups. One for the data and one for the indexes. You will want to separate these onto separate database arrays. You'll also want to separate the tempdb database on to its own RAID array.

OLTP and Data warehouses have very different IO profiles. OLTP databases have very random IO profiles, while OLAP databases (data warehouses) typically have very sequential IO profiles. Depending on the number of transactions per second this can mean that the OLTP database can actually require more spindles to meet it's requirements than the OLAP database. It all depends on the number of transactions that the database will need to handle per second, and how much data SQL can cache in memory.

You may need a second MSA30 for these additional arrays. When building the arrays you need to make sure to create the volumes with the diskpart command like I say in the article I referred you to above.

The reason that you want to consider moving to a SAN is for a few reasons.
1. If you need additional IO capacity you can easily add more disks to the array. With a SAN you can easily add hundreds of disks to run a single volume, where with DAS you will max out at 14 disks per volume unless you start going with host based striping which has it's own overhead.
2. If you need additional space on the volumes it can easily be added. Again with SAN you can add as many disks as needed to the array, while DAS is limited.
3. SAN back ends have IO throughput which is higher than DAS (Direct Attached Storage) can offer because of the 4 or 10 Gig Fibre back end. DAS typically has a SCSI back end which will typically max at out 320MB/sec.
4. When setting up the SAN you can build multiple paths from the server to the storage, where with DAS you can typically only setup one path to the storage.

There is one downside to using a SAN. The initial buy in cost is very expensive. Depending on the vendor and size of the unit the storage array and fiber switch(s) can end up being a 5-7 figure purchase.

When planning a large system like this is when a DBA with some high end architecture experience can come in very handy.

With a SAN if you want to go ultra hard core you can do what's called short stroking the disks where you only put data on the inner edge of the disk so that the disk has to spin less times to get to the data. This however is pretty much not needed with todays technology. I've only head of a couple of people who are still doing this, and this is because they decided to do it 40 or 50 years ago and they never changes the policies.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
No problem.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
calvinsinger, I would be careful to seek advice in many places before taking any big steps. I'm not knocking mrdenny at all, he clearly has good input in areas I know little about, and what I do know tells me he is steering you correctly (such as using Raid 10 and NAS for examples). It's just that if you're going to be directing your company into purchases potentially in the tens or hundreds of thousands of dollars, you need to be Very Certain that what you're recommending is right.

One thing I'd like to add to the discussion is that SQL Server can have some very serious contention/blocking issues that Oracle usually does not. It is not so much an issue of size or billions of records but how 2,000 clients all trying to use the database at the same time get along together. And the unfortunate fact is that SQL Server databases can suffer, very badly, from locking problems. It comes down to the design. When designed by a very experienced person who knows exactly what to do, it can be no problem at all. Are you sure your database will be designed so perfectly? SQL 2005 has added some new locking types, but I am not sure that those have entirely solved the issue.

Oracle has its own problems which I won't go into here—some of them nontrivial—but when it comes to thousands of people accessing billions of records in a single database, you have to worry about.

There is a database at work on SQL 2000 that is very large. I don't know its exact size at this point, I can check for you on Monday to give you an idea. But I do know it is over 100GB. I'll be happy to check with the DBAs in our shop to get their thoughts on this (there are over 70 SQL Servers at my company and fewer Oracle machines but still very many). We have some very large NAS systems running clustered servers "stretched" over a WAN. Maybe there could be some input that's valuable to you.

Oh, and the reason I mentioned this database is that it is slow and fairly suckworthy. It is a hodgepodge of not so great design, revealing a nasty legacy of repeated merges of various databases & applications. This system suffers badly from blocking problems.

On the other hand, we actually had a bad experience with a storage array (I think that's what it was) and we followed bad advice and for a long time had hot spots on our storage solution. A six or seven figure cost storage solution that was so big we couldn't fix it until we had another six or seven figure storage solution to move all the data off of. (Can't have an extended downtime because of the nature of the business.)

I know I'm rambling a bit but I hope that my input is useful to you.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
thank you Esquared. I am not aware of special things which need to be done to prevent locking in SQL Server. Any ideas would be educational.

Thanks
Calvin
 
Unfortunately it's not an administrative task: it's a developer task. As the database is designed and queries and SPs are written, the utmost attention to detail and performance are in order. Looking at execution plans, tweaking indexes, using query hints, breaking queries up into smaller batches of fewer rows (which can at times make an operation 10x faster, I've seen it)--all those things matter. Careful and selective use of NOLOCK hints (aka READ UNCOMMITTED locking level). It's an art. Experience is king.

I'll check with some guys at work and see if there's anything I can offer you.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
There is soo much stuff you can write about tuning that it is scary

One of them most important things is to have tables as narrow as possible, do NOT dump everything in 1 table, do NOT use decimal (38,12) if you never have values greater than 1000.00.

Update statistics on a regular basis. Defragment/rebuild indexes on a regular basis. If you have transactions in a stored proc them make sure that the transaction is as short as possible, move code that doesn't need to be rolled back out of the tran block.


Consider horizontal partitioning/partitioned views(2000) partitioning functions (2005). This will help with returning results, defragmenting and rebuilding indexes much faster


Use nolock for data when you don't have to have the latest value. Do NOT use nolock when updating a table you could get a phantom/dirty read and mess up your data (this will be very difficult to find later on)

If you have historical data that doesn't need to be updated then create a read only filegroup and move the data there. this way SQL server does not need to create locks at all

I can go on forever but you get the point...your DB has to be normalized and the developers need to know what they are doing. In other words no split concatenate functions on a table that size ;-)





Denis The SQL Menace
--------------------
SQL Server Code,Tips and Tricks, Performance Tuning
SQLBlog.com, Google Interview Questions
 
So Denis do you agree with me about the locking issues on huge SQL Server databases with thousands of concurrent users?

Even if you expect that your queries will be only doing row locking, when you get thousands of users all locking many rows, eventually the server decides those row locks are using too many resources and locks are escalated to page, extent, or even table locks. Then you get blocking or deadlock problems.

And the problem is even worse than you might imagine. If you block only ONE other process, it doesn't sound too bad, right? But now every lock that process has is potentially blocking another process. And so on and so forth, quickly escalating to nearly every user in the application stuck.

Oracle has its problems. But unless SQL 2005 has addressed ALL the locking issues I am not sure that it is really the best thing for the absolutely largest and most active databases.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
SQL server 2005 has the snapshot isolation level but you can get the problem that Oracle has, it will run out of space. When using snapshot isolation SQL server uses tempdb to store the log info, if the tempdb gets too large you could run out of space.

I am not sure that SQL server 2005 has addressed all the problems. I do have billion row tables but I only have 100-300 concurrent users at a time. they used to have deadlocks here daily until I redesigned the system. Trace flag 1204 is still on but I haven't see a deadlock being captured in over 18 months (knocking on wood) BTW this is 2000 not 2005 (evaluatin 2008 right now, I'll skip 2005)

I would say try both, strees test it and then make the decision. A system that large should not be decided upon without adequate testing



Denis The SQL Menace
--------------------
SQL Server Code,Tips and Tricks, Performance Tuning
SQLBlog.com, Google Interview Questions
 
I just left working at what is probably the busiest SQL Server database shop that there is. With the stored procedures written correctly and the indexes and statistics setup correctly, and updated correctly blocking was not an issue. At we typically had between 300,000 and 2,500,000 users connected to each SQL Server at a time.

(I know that many people think they have a really big enviroment, but when you are manageing systems with a total load of 500,000 transactions per second you've officially hit the big time.)

SQL Server doesn't start escallating locks from row level locks to page level locks because there are lots of people using row level locks. Lock escalation is done on a per transaction basis. If a single transaction has taken enough row level locks that taking page level locks is more efficient than SQL will use page level locks for that transaction, and that specific object only. After that transaction has completed row level locking will take over for that object again.

calvinsinger,
Some of the big things to do to help avoid locking (SQLDenis will have covered some of this already):

1. Proper hardware design and layout.
2. Proper filegroup design (the correct objects grouped into the correct file groups, and the correct objects broken into the correct file groups).
3. Correct Table design
4. Proper index design
5. Proper index fillfactor settings.
6. Correct usage of isolation levels (including hints such as NOLOCK, PAGELOCK, TABXLOCK, etc).
7. Proper usage of index hints to force the correct index to be used (SQL doesn't always make the correct choice).
8. Making sure that your indexes are defragmented
9. Making sure that your statistics and up to date, and have the proper sampling settings for your situation.
10. Proper data types for the data being stored.
11. The least amount of blob data stored within the database as possible (in most cases, it can be the best place for it).
12. Read only file groups when possible
13. Understanding what kind of locking SQL does when it does inserts and deletes, and how these locks effect other processes which are going on within the database.
14. Understanding the locking implications of SELECT INTO queries

There's plenty more, but these are a good first place to look.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
Denny,

Thanks for the comments. Everything you say sounds great and is obviously the product of some good experience. Especially thanks for the notes on the lock escalation being confined to per transaction. I'd still like to read more about this. Do you have a good resource for me? I'll do some searches later.

Any chance you'll tell us who the shop was that you left?

calvinsinger,

I talked to a DBA at work who has experience with SQL Server and Oracle and he says: Don't even think about trying a system like this on SQL 2000 because reindexing has to be done while denying everyone access. That can become a serious problem.

He also said that traditionally when one wants to have a huge system, Oracle has been the way to go. And it may continue to be the way to go for several reasons:

• Oracle can run on UNIX which has historically been more stable than Windows.
• Oracle has been pouring a lot of investment into distributed computing: getting wide instead of tall. Creating computing clusters, distributed storage, going wider and wider to spread the load out is all getting easier and easier. With SQL Server, your best wide option is an Active-Active cluster (that is 2 nodes) which can be quite difficult to set up and must share the same storage (no easy wide component to the storage).
• Oracle can be more complicated to manage but the fact is that compared to SQL Server there are many times over the number of settings available to the DBA to tweak performance. You can get in trouble if you don't know what you're doing, but if you do know what you're doing, you have more tools to effect optimization.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
>> Don't even think about trying a system like this on SQL 2000 because reindexing has to be done while denying everyone access.

This is no longer true with SQL 2005.

-George

"the screen with the little boxes in the window." - Moron
 
mrdenny said:
we typically had between 300,000 and 2,500,000 users connected to each SQL Server at a time.

not bad for a tonka toy [LOL]

[small]----signature below----[/small]
I don't do any programming whatsoever

Ignorance of certain subjects is a great part of wisdom
 
Denis said:
you can have a 8 node cluster with sql server
Thank you Denis. They're still all sharing the same storage in a way different from Oracle's distributed clusters stuff.

>> Don't even think about trying a system like this on SQL 2000 because reindexing has to be done while denying everyone access.

This is no longer true with SQL 2005.
Er, yeah. That's what I meant.

[COLOR=black #e0e0e0]For SQL and technical ideas, visit my blog, Squared Thoughts.[/color]
 
I know people who have tried to get Oracle RACK working correctly. It's very hard to get a stable RACK installing running (RACK is there distributed comupting solution). Everyone in Oracle Sales will tell you how great RACK is. Most of the support people that work at Oracle have a heck of a time setting it up.

As gmmastros said SQL 2005 now supports online index rebuilding (within the Enterprise edition). Many of us have requested this support be included in SQL 2008 Standard edition. We'll see what happens in the next release.

Active/Active clustering doesn't actually scale the database out. It simply provides you with two independant instances which are both highly available. If you now federate the database accross multiple instances you can scale the database out nicly. This provides for many servers all hosting a piece of the database which distributes the load so that each SQL Server only has to handle it's piece of the system and passes the rest of the load to the server which hosts that piece. Federating a database is fairly complex, but when does correctly can provide a very high end solution spread across multiple servers. Because the systems are federated and not clustered together they do not share storage with each other.

In clustering each instance has it's own storage. No to instances in a cluster can share a hard drive as each SQL Instance is setup in it's own resource group, and resources can only access resources in the same resource group.

>> Don't even think about trying a system like this on SQL 2000 because reindexing has to be done while denying everyone access.

This is no longer true with SQL 2005.
This is still true in SQL 2005 unless you specify the ONLINE=ON flag when rebuilding the index and you have SQL 2005 Enterprise edition. Also tables which have blob data in them can not have the index rebuilt online. The blob data type in the table causes the index rebuild to require an offline rebuild.

As for where I was working, I was a DBA at MySpace.com.

Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)

--Anything is possible. All it takes is a little research. (Me)
[noevil]
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top