Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

x345 - All discs failed at the same time

Status
Not open for further replies.

ILW

MIS
Jul 1, 2003
72
GB
We've had 2x 3-month old IBM x345's where all the 6 discs have failed simultaneously. It's definitely the discs rather than the controller that fails - moving the discs to another server doesn't fix it. OS is W2K3, discs are configured RAID5 with on-line spare. They're 144GB each.

Anyone else seen similar? Is this a known fault?
 
Probably had one disk go defunct and the others went down in sympathy or there was a hiccup somewhere in the system which marked all the drives defunct.
Moving hard drives around to other servers is only going to compound the issue and possible mess things up to where you could lose your data, your best bet is to call IBM support because they deal with this stuff on a daily basis.
 
It sounds like the HDDs lost communication with the RAID controller. Definitely call IBM for help, or look at their site. They have documented the procedure to recover from this fault.
 
Not a good idea to bring drive back online unless you know the order that they went down in. If you bring them back online in the wrong order you can loose your data.
If all the drives go DDD at the same time it is most likely that the drives lost communication with the raid card or vise versa, in that case boot to ctrl-I, go into the advance functions, restore to factory defaults which resets the raid card and then copy the copy configuration from drives to controller, if they stll all come back ddd call IBM support they will get the raid logs files and will be able to determine which drives to set online and it what order.
 
Agree that it's not a good idea to bring the HDDs back on line one at a time; but initializing the controller, then importing the configuration from the HDDs is just as bad of an idea. The controller is not confused, and is aware of the state of the HDDs. The controller lost communication with the HDDs at some point (probably a cable or incorrect shutdown/power on sequence if an enclosure). Initializing the controller and importing the configuration from the HDDs is only appropriate when the controller has been replaced, or somehow had it's configuration corrupted. The right answer is to call IBM and have them take a look at the controller's log file to determine which HDD died first. If you keep messing with the server before doing this, the controller log may be over written by the latest values of powering the server up & down, and root cause will be impossible to determine (READ: you will have a long and tedious path to determine which drive may have failed first or you're going to restore from backup, or perhaps both).
 
Considering I do this stuff everyday, the first thing we, IBM support, is going to do is have you restore to factory and copy config, this tells us what the configuration on the drives is versus what the controller thinks it is and if the communication between the controller and hard drives is good.
If they all come back DDD or we get a invalid configuration then we dump the raid logs and work from there.
None of that is going to cause any problems that were not already there.
 
Just to let you know, IBM replaced the "backplane" of both servers. I wasn't on-site @ the time to find out more details, but I do know they were unable to recover anything from the discs. Thanks to all for the advice.
 
I do this stuff every day too, and have been for more than 5 years!!! I'm your escalation point, and we've been telling you to quit initializing the Controller!!! It hasn't lost the configuration. You should be capturing RAID Logs and escalating the case if you cannot determine which HDD failed first.
 
ok ibmtech65 if you are my first line support,and I have been doing this for 18 years,8 with ibm. Then let me ask you why support has me bring back online 1 drive at a time,if the restore config fails? and this has happened to me on more then one occassion. Rich. Rclark2 is my short name if either of you want to continue this in notes.

I shall use google before asking stupid questions!
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top