Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

perc 5i and failed disks

Status
Not open for further replies.

terry712

Technical User
Oct 1, 2002
2,175
0
0
GB
hello

have a dell 1900 - it has 2 disks in a raid 1. Had a look at the box and it was sitting at f1 to rety reboot f2 for setup. i went into the ctrl r for the array and disk 04 was failed (the other was 05)- i offlined disk4 and rebooted and it came up.

colleague spoke to dell and they wanted the latest firmware applied to the disks. i applied the firmware's over the drac and then rebooted box and it was still ok. i then tried to online the disk from the server manager web page - nothing happened. so rebooted and onlined from the ctrl r array config. i could see this did a back init .

server is now hosed doesnt matter if i offline 04 or that.

this box is a dc for a small site so i dont care about it - it has no data and it's drac enabled so can be rebuilt easy enough. afraid i'm more used to hot swap disk etc so what should have been done here

should i have selected rebuild on disk 04 or ?

thanks
 
Disk 4 was accepted as the full functioning disk, and used disk 5 to rebuild with. Never had this happen in a raid 1, as it should have found #4 as foreign or online , and you should have had the option to change the disk from which to rebuild from, before the adapter choose to rebuild with it; at that point you could have opted to rebuild with #5 as the operational disk (simplified). The adapters boot GUI interface can be a bit confusing when choosing which drive to rebuild from, and maybe in your case, the adapters default choice was incorrect.

Another possibility is the two disks did not have the same firmware, as the offline disk did not get the firmware update, thus the adapter took the older firmware disk as the disk to rebuild from. Even so, there should have been a prompt concerning a config mis-match, then you should have had to choose which drive to rebuild from.

The only complete raid failure I have had in years was due to a firmware update from Seagate (not Dell). At that time (11 years ago) you had to do firmware updates on individual drives via a SCSI adapter. I started with the hot spare, figuring if anything would go wrong, the hotspare would refuse to be promoted back to a hotspare position once added back to the raid adapter..wrong, I added the drive back in, as a HS and the raid 5 was instantly toast. Took a 36 hour day, 6 hours of sleep, then another 36 hour day to replace the server, and that was with a backup, due to SQL databases; that will never happen again.



"i then tried to online the disk from the server manager web page - nothing happened".
At this point I would have proceeded very carefully. If a disk does not go go online at the web interface, something is seriously wrong. Once you go into the adapter startup interface at boot, it is much easier to have thing go wrong, it is easy to choose the incorrect disk to rebuild from.

Due to the my raid loss during the firmware update...
Almost all my DC role holder servers have raid 1 with a hotspare for the boot disk. So basically if I have an issue like yours, the hotspare kicks, and once rebuilt, I can try to reinstate a failed disk in safety.
I rarely have luck in reinstated a failed disk, once a disk fails, 70-80% of the time it will fail again and again in the future. I do succeed when there is a firmware issue on the adapter or disk, which once corrected renders the disk usable.

For DC servers which hold all AD roles, I have a hotspare and a spare drive. I set the DC's Tombstone to 180 days, and pull one of the drives every 6 months and swap in the spare, giving me an exact clone of the DC's boot drive for an emergency restore. Mind you,I swap out the drive before the six month period if I suspect an update will cause issues or if I make major changes to an AD system. Also if the stored drive is used in an emergency restore, the AD replication needs to be reversed to the the involved server (Burflag), and all software/updates need to be replace for the period.
An option to having a spare enterprise quality disk is having a cheap SATA disk hanging off the SATA interface, and using 3rd party cloning software such a Acronis to clone to the SATA before replacing a disk, before a major MS update or major change to the server.



........................................
Chernobyl disaster..a must see pictorial
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top