Something to remember about SANs is that when you write to the disk, you aren't actually writing your data to disk. It's going into the large buffer cache and stored in memory on the SP, and is then flushed to disk. If your write cache on one SP is running full all the time then you may need to look to moving some LUNs to the other SP.
Think of a massive site like MySpace. Running hundreds of databases with half a billion transactions per minute. All of which is running on SAN storage using huge amounts of cache and huge numbers of disks shared across all the disks on the SAN (not using EMC, so the config is slightly different). (I was a DBA there and these figures are accurate.)
When SANs will also help with the stop and go traffic (and I like the analogy) is that in addition to SQL Server trying to do read aheads to cache data before you actually need it, the EMC will also prefetch data from disk into it's memory so that when the SQL Server wants to access it it's in memory and the SAN doesn't have to go to disk to get the data.
If your RAID Group is setup with 3 high load OLTP database servers odds are the RAID Group will not be able to handle the requests. When setting up SANs it's important to think not just in storage capacity, but also in IO capacity.
Let's assume that we have 3 database servers that we have to deal with. We request RAID 0+1 storage for them so the SAN guy should know that we need fast storage. He has a RAID 0+1 RAID Group with 8 disks in it. If each disk can handle 160IOs for Sequential IO, and 120IOs for random IO (don't quote me on those numbers) we should be able to handle 480 IOs per second on the RAID Group. Assuming that we have 146 Gig drives we have ~584 of storage in the group. If we request 100 Gig drives for each our SAN guy goes, "Hey those will all fit in this group". Now the problem becomes when we go to put the load on the RAID Group, we now have to put 700 IOs on our RAID Group which can only handle ~480 IOs. Our storage is now over subscribed on IOs, and the storage looks slow because it can't keep up. But that's not the fault of the storage, it's the fault of the guy who decided to put to much load on to the RAID Group.
When setting up a high volume SAN you don't get to use all the Megs available on the RAID Groups. Sometimes you have to waste space in the RAID Group in order to get more performance for the LUN.
Now, assuming our 3 LUNs have requirements like this.
Server1-250 IOs
Server2-250 IOs
Server3-200 IOs
All three servers should have there LUNs placed in separate 4 disk 0+1 RAID Groups to get the needed IO capacity. While you are waisting some space, you get the IO that you need. Some things that you can do with the left over space would be things like low IO file shares, Quorum drives for clusters, etc. Things that need some space but not a lot of IO requirements.
I'll bet that if you look at the RAID Group which has the poorly performing LUN in it you will see that the RAID Group has other LUNs in it which also need a lot of IO and you are maxed out (this is the most common cause of disk slow downs in a SAN environment).
This can all be seen in the Analyzer that comes with the EMC (you may need to purchase a license to use it). You can also look and see how far the drive heads have to travel to get to the next block of data. If they are traveling to far, you may need to redesign your physical database layout and move some objects to different file groups on different hard drives. I know that when I did this I saw a 20-30% performance increase on the system.
This gives you some benefits at a few levels.
1. You get more disk queues in Windows.
2. The SAN will have an easier time cashing the data from disk into it's memory as the data is broken out to a different disk, so it can see the difference.
3. SQL Should allocate it's space differently because the data is coming from a different file group, so data will be cached differently.
Of course all this is on top of the normal index tuning that should be done every once and a while.
Have you looked at sys.dm_io_pending_io_requests to see when IO is pending? Or the sys.dm_io_virtual_file_stats to see how much waiting for the disk is happening?
BTW, I'm not a SAN guy, but a DBA with some SAN experience.
SQL Server isn't the only software platform which uses logs. Exchange does as well, it's using the Jet engine, and most large companies use SANs for Exchange as well.
Can you elaborate more on what you are seeing and how you are diagnosing the issue when you say:
1. I am writing a lot of data to SAN
2. I commit a small transaction in SQL server
evilDBA said:
All SAN guys are always talking about *throughput*, and never - about the reactivity.
Not all SAN guys talk about throughput. That's what the marketing guys and sales guys talk about. Talk to the the engineers that design and build the units (I highly recommend a trip to EMC World next year). They talk mostly about how to get the storage to respond to the devices as quickly as possible to get on to the next request.
What EMC product do you have, and what do your RAID Groups and LUNs look like? (Maybe we can spot some issues right off the bat.)
If you have a really top notch SAN guy (which I had the benefit of having a while back) and they are willing to work with the DBA you can get some truly amazing performance out of the unit. It will however take some tuning, trial and error, and cost a little more per meg do to the wasted space, but it can be done; in just about any environment.
Denny
MCSA (2003) / MCDBA (SQL 2000)
MCTS (SQL 2005 / Microsoft Windows SharePoint Services 3.0: Configuration / Microsoft Office SharePoint Server 2007: Configuration)
MCITP Database Administrator (SQL 2005) / Database Developer (SQL 2005)
--Anything is possible. All it takes is a little research. (Me)