Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Hot swap drive bays - how do they work? 1

Status
Not open for further replies.

G0AOZ

Technical User
Nov 6, 2002
2,342
GB
Other than fitting a couple of these drive bays into a small server, I've no experience of them.

Purely out of interest, in simple terms how does the technology work? I assume there must be some sort of buffer to eliminate the problems that one associates with connecting and disconnecting a hard disk drive from a running system, both for data lines and power rails. Apart from physically keeping the drive in place, I suspect the lock mechanism actuates a switch or switching ciruit. What is it switching? Power and/or data lines?

ROGER - G0AOZ.
 
My only experience has been with hot-swap drives in IBM and Compaq servers - but my guess that the hot-swap functionality is built into the RAID-1 and RAID-5 and RAID-10 controllers they are used with - since they are designed to deal gracefully with a drive failure. If you had a non-RAIDed drive in a hot-swap mounting, and pulled it out without dismounting the drive first, you'd get the same results as though you'd pulled the cable on an conventionally mounted drive.
If someone else has contrary info, please jump in.

Fred Wagner

 
Here is the recommended reading on the subject (very technical):

Understanding Hot Swap: Example of Hot-Swap Circuit Design Process

and for the layman:

Power electronics

The DC power supplies to a hot-swap component are usually pre-charged by dedicated long pins that make contact before the main power pins. These pre-charge pins are protected by a circuit that limits the inrush current to an acceptable value that cannot damage the pins nor disturb the supply voltage to adjacent slots. The pre-charge circuit might be a simple series resistor, a negative temperature coefficient (NTC) resistor, or a current-limiter circuit. Further protection can be provided by a "soft-start" circuit that provides a managed ramp-up of the internal DC supply voltages within the component.
source: Hot swapping -
Ben
"If it works don't fix it! If it doesn't use a sledgehammer..."
How to ask a question, when posting them to a professional forum.
Only ask questions with yes/no answers if you want "yes" or "no"
 
Thanks both for your input. Having read the stuff Ben linked me to, I feel I have a better understanding of the technology now. It's basically all about manipulating and controlling the voltage rails, rather than the signal/data connections. Very interesting... Many thanks.

ROGER - G0AOZ.
 
My main RAID experience has been 6 years on an IBM Netfinity 6000R with 2-drive RAID Zero in the main CPU and two bays of 14 and 11 drives in RAID 5 - total of a Terbyte of RAID, installed back in 2001. ServeRAID Manager is the management tool, gives a graphical tree both logically and physically, shows the status of the active and hot spare drives. It shows you the status of all the physical drives and the logical drives they support, and gives you control to deactivate a drive prior to removal. If a drive fails and a hot spare is present, the hot spare is automatically made part of the array. But you'd better check the system - either by watching the lights or viewing ServeRAID through remote desktop, because if a second drive fails when the system has already used the online hot spare, that array fails, and you have to restore from backup once you've replaced the failed drive. As the system as aged, the MTBF has decreased - the lights are now checked at every shift change in the Data Center. All our new systems are using EMC SAN's, which are seveal generations newer technology.


Fred Wagner

 
Fred,

I think you may be mixing up things a bit here.

Hot-Swap and Hot-Spare drives are like apples and pears, they are not the same...

Hot-Spare:
A hot spare or hot standby is used as a failover mechanism to provide reliability in system configurations. The hot spare is active and connected as part of a working system. When a key component fails, the hot spare is switched into operation. More generally, a hot standby can be used to refer to any device or system that is held in readiness to overcome an otherwise significant start-up delay.
source:


Ben
"If it works don't fix it! If it doesn't use a sledgehammer..."
How to ask a question, when posting them to a professional forum.
Only ask questions with yes/no answers if you want "yes" or "no"
 
Ben - The Hot Spare drives are in hot-swap bays, and when one of the active drives in a hot-swap bay fails, the Hot Spare automatically assumes its function and becomes an active drive in the array. When I see that I have a failed drive in a hot-swap bay, I remove it, put a good drive in the hot-swap bay, and let the array designate it as the new Hot Spare. I've been getting a fair amount of practice at this as the system ages!

Fred Wagner

 
I understand what you are saying, basically you have the HotSpares set up for easy replacement and removal...

most rack servers today have those HotPlug Bays, integrated be it either for SATA or SAS.

both SATA and SAS specification allow Hot Pluging drives, IDE or EIDE did not, as far as I know, thus the mentioned hot swap bays where invented, for easy removal of the drives without taking down the server...

but you probably knew this already, and we where just misinterpreting things a bit... ;)


Ben
"If it works don't fix it! If it doesn't use a sledgehammer..."
How to ask a question, when posting them to a professional forum.
Only ask questions with yes/no answers if you want "yes" or "no"
 
I confess to being a dinosaur here - my Netfinity server has classic SCSI drives with a 4-channel SCSI controller (3 channels used) - and in the last century I had Compaq Proliants with Hot-Swappable SCSI also - those were RAID 0 or 1 - I don't remember if there was more than one controller, and before that Storage Dimensions units with RAID 1 (definitely NOT hot-swappable!)in external enclosures. Those were 1GB drives, full height, 5 1/4 inch beasts! I'm getting out of the drive business soon - all our new servers are virtualized, and storage is on an EMC SAN, which is maintained by our Data Center people, with the SAS and SATA drives you mentioned. At home I run without RAID. We shut our systems down when not in use, and the only equipment failures I've ever had were a Power Supply and a UPS. Keeping drives cool, protecting from shock are keys to longevity. They do have a finite lifetime in hours, as I've proved with our Netfinity setup, but in a home environment, where they don't have to be spinning 24/7, they'll be obsolete before they fail!

Fred Wagner

 
SCSI is still used today, although SAS is overtaking it, because of the lower cost and larger size drives, The problem with SATA or SAS is they have a much smaller MTBF. So in mission critical data centers SCSI still rules.
 
Rclarke250,

I beg to differ as to the MTBF...

SAS has a slight higher MTBF as to the Parallel SCSI (it is pretty much the same technology)...

Code:
Device         Duty Cycle               MTBF (Typical)

Desktop SATA    8 hrs/day, 5 days/week    600,000 hrs
Parallel SCSI  24 hrs/day, 7 days/week  1,200,000 hrs*
SAS            24 hrs/day, 7 days/week  1,400,000 hrs (est.)

* Current generation, next generation for parallel SCSI projected to be
1,400,000 hours

Table 1: MTBF comparisons for different hard drive technologies
Source: thefreelibrary - SCSI finally gains serial attachment [SAS] ... after decades of steady progress.




Ben
"If it works don't fix it! If it doesn't use a sledgehammer..."
How to ask a question, when posting them to a professional forum.
Only ask questions with yes/no answers if you want "yes" or "no"
 
From my experience with a particular server with 28 drives, which has been in service since February 2001, which would put it around 82000 hours service, we've had 5 or 6 drives fail in the 5+ years I've been in charge of the system on it. So the million+ hours MTBF is not credible in my experience. Maybe 100,000 hours, with some failing well below that number, and others that may run well past it, if we had the nerve to keep it running that long.

Fred Wagner

 
its the same with cars, some may state that they can go 50 miles per gallon but in reality they only go 40 mpg...

and it always depends on usage, wear and tear... etc... but your figures are probably closer to reality, then what a manufacturer may state...


Ben
"If it works don't fix it! If it doesn't use a sledgehammer..."
How to ask a question, when posting them to a professional forum.
Only ask questions with yes/no answers if you want "yes" or "no"
 
I think I read that the way they extrapolate the MTBF figures is to run a number of the drives - say 20 - until one of them fails. then the MTBF becomes then number of hours until that first failure, multipled 20, by the number of drives that were running in the test. That's the only practical way - if you got a real number, like I've experienced, the number would be meaningless because they'd be obsolete. The manufacturer has to come up with some kind of relative MTBF number to price and sell the product.

Fred Wagner

 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top