Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

How to handle predictive failure

Status
Not open for further replies.
Apr 18, 2003
250
0
16
US
I have a Proliant 8000 with a drive that is reporting a "predicitive failure".

The array is RAID5 and I nkow that if I replace the drive 'hot' that I will lose data.

My question is - can I power off the server and replace the drive without losing any data?
 
Yes, you can power the server off and either replace the drive or just pull it. If you swap the drive then power the server back up, the server will realise that a disk has been replaced and will ask you as part of post boot whether you want to ackowledge the fault and start restriping (the answer is yes of course). If you just pull the disk then power the server on, you will be asked to confirm whether you want the server to work in interim mode (without the failed disk). When you put a replacement disk back into the server, the server should start to restripe as normal.

I would never suggest that a drive is pulled out "hot" as I have seen it cause corruption in the servers logical volume which caused data corruption so you are very right not to want to do that.

--------------------------------------------------------------------------
"Who is General Failure and what is he doing on my computer?"
--------------------------------------------------------------------------
 
Ok - that is what I will do - power off the server and replace the drive.

Thanks!
 
It is safe to hot-plug a new drive, but it can be a problem if you 'hot-unplug' a drive though usually with a Smart Array with BBWC enabler even this shouldn't be a problem.

If you have a spare drive slot, just plug in the new drive as a hot spare and leave the old drive in place until it eventually fails. The new drive will take over its function automatically. The only effect the user may note is a slow down in response whilst the array rebuilds on the fly.

Regards: tf1
 
tf1 - it shouldn't be a problem, I agree. But I have seen that pulling a hot drive can cause problems, especially on older Proliant hardware. The best option in this situation and in my opinion is always to power down the server and swap/pull the drive.

There is nothing stopping you from setting up a hot spare, but the failed disk will still need to be replaced when it does eventually fail otherwise the RAID5 configuration will be left to work in interim mode and will still report a disk missing - the disk missing will be the faulty disk that has failed and been removed.

--------------------------------------------------------------------------
"Who is General Failure and what is he doing on my computer?"
--------------------------------------------------------------------------
 
Unfortunatly I don't have a free slot in the server to setup as a hot spare.
 
i did a hot-swap replacment on a 8000 the other day everything worked fine that why the make hotswap components you just need to make sure the drive you have is hotswapable
 
I'd still recommend that if the disk is hot, pull it with the server down. If the disk is cold (failed/inactive) then pulling the disk is not a problem. I have seen too many problems doing this to advise otherwise.

--------------------------------------------------------------------------
"Who is General Failure and what is he doing on my computer?"
--------------------------------------------------------------------------
 
I agree. Unless there is a compelling reason for the server not to be taken off line, then the safest procedure is to power down first.




Regards: tf1
 
oh ok im sorry i missunderstood i agree with that. i dont think its ever a good idea to pull it out if its still running.
 
For what it's worth, As part of my standard burn in process for the last 7 years, all on Proliant hardware, I always pull out a hot drive and replace with another before putting the system in production. I have yet to see a single failure on a properly configured proliant server. We are easily talking 150+ various proliant servers with various SA controlers.
 
You might have done it yourself, I have on properly configured Proliant servers and 9/10 there has been no problem. But would you be willing to recommend it to someone and they do it and it all goes wrong? I wouldn't...

--------------------------------------------------------------------------
"Who is General Failure and what is he doing on my computer?"
--------------------------------------------------------------------------
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top