Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Raid Theory Question 2

Status
Not open for further replies.

jouell

MIS
Nov 19, 2002
304
US
Hi.

How is it that any RAID will give a higher performance than a single disk on a read operations?

I have done some research and I understand you have multiples drives reading at a time, etc. but won't it only be faster if each disk is on a separate controller?

The way I understand it, IDE drives "block" the channel and the other device cannot be accessed. For SCSI disks they need to use arbitration to decide which one can have control of the bus and hence send data.

So it seems like at no point are multiple disks actually transferring data across the bus at the same time.

Can someone shed some light on this?



(For reference, I will be upgrading my PC for video editing and will be choosing between adding 2 more drives on separate controllers or doing a RAID 0 setup and am very interested in the theory above, not so much a recc for editing)

Thanks
-John

 
You have to look at timimg latencies & transfer speeds to assertain the full picture.

Example: CPU & FSB of 400 MHz:

400 cycles/sec in 64 bit chunks = 25,600bps

HD @ 133MHz = 133cycles/sec = 8,512bps
(could be in bytes?? units aside)

CPU is constantly waiting for data and bus is open during these intervals.

RAID "0" (Two drives striped)= essentially 2X the data transfer speed as second drive is transfering when first drive is gathering etc.

This is only a basic theory overview, and many dedicated controllers have their own processors & RAM to increase this even more.

rvnguy
"I know everything..I just can't remember it all
 
rvnguy,

Thanks for the info.

My question really comes down to the scenario:

For Example:

Let's *assume* these 2 IDE drive are capable of:

IDE HD1->33 MB/s
IDE HD2->33 MB/s

and do, in fact, maintain this as a sustained rate of transfer for a period of time.

They are connected to an IDE RAID controller (Supporting 100 MB/s) (Raid 0). While transferrring , say, a 1 GB file how can you realize more than 33 MB/s if both disk are one the same channel, since they share the bandwidth of the channel:

Time (s)

1 IDE HD1 writes 33 MB
2 IDE HD2 writes 33 MB
3 IDE HD1 writes 33 MB
4 IDE HD2 writes 33 MB

So 132 MB transfered in 4 seconds (33 MB/s) , not counting "overhead" to wait for the other to "release" the channel, etc, just keeping it simple.

I could understand if they could both send 33MB/s over the cable at the same time but this not the case per:



If we keep the above scenario the same and make them SCSI, I understand 2 of the benefits of SCSI are: Command queuing and Command reordering, which makes for a much more efficient transfer, and thus improving performance compared to a disk that doesn't do this:



But again, I don't see how this helps when transferring over the cable, since each disk still needs to take turns on the bus, however quickly the arbitration process takes place:



Let me know if my question makes more sense.

Thanks!
-John
 
I'm not going to pretend I know the intricate details as to how the "abritration process" works over IDE, but I know it is less of an issue over newer interfaces suce as SATA.

Also, this article should answer some of the concerns you brought up:

When a file is sequentially stored across both drives in a RAID 0 configuration, you can easily see transfer rates that are 50 - 100% faster than with a single drive. Usually you get better read burst speeds in a SCSI or SATA environment, but even in IDE, you can see a noticeable difference in this regard.

Like the article above mentions, however, it does not apply in many real-world situations and applications.

~cdogg
"Insanity: doing the same thing over and over again and expecting different results." - Albert Einstein
[tab][navy]For general rules and guidelines to get better answers, click here:[/navy] faq219-2884
 
Time (s)

1 IDE HD1 writes 33 MB
2 IDE HD2 writes 33 MB
3 IDE HD1 writes 33 MB
4 IDE HD2 writes 33 MB

You would be correct if the data were transfered in 33MB chunks, but this is not how it occurs.

Depending on the platform, it is in 64 or 128 bpcycle chunks essentially and IDE drives utilize 16bit data chunks, I think for compatibility reasons.

The actual electric read/write info occurs at just under light speed but is constrained by clock cycle timing.

If you opt for a software implementation of RAID "0", I would suspect that you will suffer from needing CPU cycles to aid in striping. This is where discrete , separate hard drive RAID controllers add benefit. These are not the cheapest, but have their own CPU & ram on board along with their own I/O control.

There are many factors that are involved. you might get some clarification here:



rvnguy
"I know everything..I just can't remember it all
 
Thanks for the info.

Of course 1 second makes the math much easier, rvnguy.

I am getting the sense that using two IDE controllers is going to be better than RAID 0 to setup things up logically, but when transfers related tasks are concerned RAID0 will generally win, since again, this will be for video editing/rendering, either with pinnacle or sony vegas.

However, the technical (on a bit level), reasoning for this is unclear.
 
jouell,

I'm not sure why there is so much confusion on the subject. You don't have to understand the underlying details to see how it works.

You're assuming that data transfer with CPU/RAM is constant from the drive. It is not. There are very small millisecond gaps between data being read from the hard drive in chunks, to the time it is transferred over the bus. These gaps are where a 2nd drive working almost simultaneously can help out.


Take the following for example:
__________________________________

Say you have two identical ATA/100 drives that average 40MB/s with burst transfer rates reaching 55MB/s. You stripe them together using RAID 0. Although the average might only increase to 50MB/s, now you're seeing some burst transfer rates as high as 85MB/s. That makes a big difference in certain situations.

Data is interleaved between the two disks. When one drive is doing an "external" transfer, the other drive is busy doing an "internal" transfer. On an internal transfer, data is collected from the platters and transferred to the internal cache or read buffer. An external transfer is the process of moving data across the bus or interface. It is important to realize the limitations of both.

When there is only a single drive available, delays between internal and external transfers are higher. Adding another drive to that mix in RAID 0 helps to reduce that delay by splitting the time between both types of transfers.

The biggest example of that is in a large file, such as video, that can often be found stored in a sequential manner. There is very little seek time involved when reading such files, and internal transfers are much faster (especially on drives with a large cache buffer such as 8MB). This is where RAID 0 shines, because as one drive finishes up the external transfer, the other is about to begin. It's not quite perfect harmony, but you get the idea.

Again, you won't see much difference when both drives are reading data scattered across the platters. Unfortunately, this applies to most real-world benchmarks.



As for CPU utilization using software RAID, I've seen benchmarks that indicate only a small difference when compared to hardware controllers. This used to be more of a factor in the pre-1GHz days. But as CPU's reached 2GHz and beyond, there is very little difference in terms of hogging the CPU.

~cdogg
"Insanity: doing the same thing over and over again and expecting different results." - Albert Einstein
[tab][navy]For general rules and guidelines to get better answers, click here:[/navy] faq219-2884
 
cdogg,

"I'm not sure why there is so much confusion on the subject. You don't have to understand the underlying details to see how it works."

No, but I want to understand it.

Anyway I understand everything you've said, and do understand RAID 0 can provide performance increases over one drive, even on a shared IDE bus, due to what you mention above, however:

"Say you have two identical ATA/100 drives that average 40MB/s with burst transfer rates reaching 55MB/s. You stripe them together using RAID 0.

Although the average might only increase to 50MB/s, now you're seeing some burst transfer rates as high as 85MB/s. That makes a big difference in certain situations."

My question then (in your terms) is then:

How is it possible to achieve 85 MB/s, when each disk has a MAX rate of 55 MB/s? At any given millisecond the rate of transfer across the bus is 55 MB/s. Then the next transfer is at 55MB/s, etc...How will you ever beat the MAX of one drive?

If you said the MAX is 55 MB/s for each and each averages 10 MB/s, I can see how RAID'ing the two can make the transfer 20 MB/S, because you're filling in the gaps in time due to positioning, but if one you hit the MAX and have perfect ideal conditions, those numbers don't add up. Remember the bus is shared, and drive are taking turns.

And yes I am talking theory not practice.

Thanks!
-John









 
Maybe a crude graphic will assist:


{read1 - wait - read1 - wait - read

{wait - trans1 - wait - trans1 - wait -

{wait - receive1 - wait - receive1 -

the basic flow above happens 1,000's of times a second based upon the CPU GHz, FSB MHz, etc. and the rated transfer is say 40MBs. Now in RAID "0" the following is basically what happens:


{read1 - w - read2-w- read1 -w- read2 -w- read1

{wait - trans1 - w - trans2 -w- trans1 -w- trans2 -w- trans1

{ - wait -rec1 - w -rec2 - w - rec1 - w -rec2

The above also runs on the same clock with shorter waits before the subsequent read and assumes a different IDE channel. i.e. pri & secondary or a duplexing bus like many on higher end controllers.

Here is also one good site and this link is for PATA: there is much info in pdf format available


rvnguy
"I know everything..I just can't remember it all
 
jouell said:
At any given millisecond the rate of transfer across the bus is 55 MB/s.

This is where you need to take a second look. The "rate" that you're referring to is over an entire second where there can be hundreds of transfers. Two drives can increase the amount of "transfers per second", which in turn increases the transfer rate (beyond the 55MB/s in my example).

my previous post said:
You're assuming that data transfer with CPU/RAM is constant from the drive. It is not. There are very small millisecond gaps between data being read from the hard drive in chunks, to the time it is transferred over the bus. These gaps are where a 2nd drive working almost simultaneously can help out.

rvnguy's graphic is a good example of what I'm talking about. Once you get a good grasp of this concept, then you can understand why the burst transfer rate of 2 drives can exceed a single drive's so-called "max" rating.

Remember, the bus is not being used constantly when there is only one drive. There are many gaps of time where the drive is reading data and NOT transferring across the bus. So when you add a second drive, there is plenty of "unused" access time available.

There are times when both drives are ready to transfer, and in those situations your logic about "taking turns" applies. However, there are plenty of times when only one drive is ready to transfer. As a result, adding a second drive to the equation helps fill in some of those gaps. High rates like the 85MB/s in my example is achieved when both drives are working in synch (one transferring and one reading). The rate reaches it's highest point when the two drives are reading a large contiguous file that is stored sequentially, making good use of the buffer cache.

Hope this is making more sense now!
[wink]

~cdogg
"Insanity: doing the same thing over and over again and expecting different results." - Albert Einstein
[tab][navy]For general rules and guidelines to get better answers, click here:[/navy] faq219-2884
 
rvnguy,

I thought about this, I wasn't taking the gaps in time the bus is waiting that can be filled by transfers by HD2 into consideration.


Thanks for the info!

-John
 
cdogg,

Great!

Thanks for breaking this down with me!
Thanks for your time as well.

-John
 
Another factor here is disk latency - to read a certain sector from a disk, you have to wait for the disk to rotate to bring the sector under the read head. On average, the latency (the time it takes to spin the disk round so that the wanted sector is under the head) is 0.5 revolutions of the disk. Of course some reads are nearly instant, because the requested sectior was coming up on the head, whilst some need the disk to do nearly a complete revolution, because the requested sector just passed under the head. Hence the average (when you add all reads together) of 0.5 revolutions per read.

Now think of two disks, mirrored. If either disk could supply the requested sector, then odds are that one of the disks will have the sector closer to the head than the other (unless both disks are turning in perfect synchrony, which is very unlikely!). Therefore, all other things being equal, a mirrored disk pair generally has lower latency than a single disk for reads.
 
Jouell,

If you happen to see this, I thought of another way of looking at this that might assist.

If you look at HD access times they ary in miliseconds (1/1,000 sec)- 1X10^3 and 4 milisec's is about the fastest seek.

most All other things are in nano secs 1X10^9 or a billionth of a sec.

This should illustrate that the other items are mostly idle waiting for the HD and a second HD can fill this idle time.

rvnguy
"I know everything..I just can't remember it all
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top