I have three IBM Netfinity servers peer-to-peer networked, all using Windows 2000 Pro. Recently (June 26) my Netfinity 5000 (8659-22Y) crashed. I have three 18.2 GB drives (all IBM DNES-318) in an array at RAID-5. There are two additional 18.2 GB drives slotted in (both IBM DNES-318 as well), but they're not part of the array, only been sitting there at Ready, as I bought them later and haven't got around to working them in yet.
On the surface indications point to the three drives in the array, as all are now defunct (so the logical drive is Offline), the three front lights now amber. But I think they've been set to DDD by the server as a precaution since something else has failed (besides, the odds of all three dying at the same time seems a slight stretch). What exactly the source problem is I don't know, although I suspect it's the backplane, or possibly the ServerRAID controller (3L Ultra II, BIOS 6.11.07), although elements of both seem functional.
The scenario is confusing because the system error log, and ServeRAID dump log are unclear to me about the source issue. I don't know how to read the error codes in the logs, can't find any place on the web that can, and IBM support is too expensive to consider. I haven't backed up the server for awhile, so am eager to rebuild the array, if possible, but am trying to be methodical about this since I don't want to lose any data.
Up until the crash everything seemed to be working properly -- nothing out of the ordinary to report. The issues that stand out from the logs are 1) that at some point one of the two drives at Ready (the fifth one, at ID 4) was flagged PFA, although I never noticed any error message; 2) according to the system diagnostics log, three days before the crash there were two entries that indicated a ServeRAID controller failure, or "internal error":
Entry Number: 22
Date/Time: 2005/06/23 22:04:40
DMI Type: 08
Source: DIAGS
Error Code: 035-260-499-20050623-57-RAID Interface: Failed
Error Code: CPPRHTS1&2
Error Data: (Adapter in slot 4; internal error)
Error Data:
Entry Number: 23
Date/Time: 2005/06/23 22:03:41
DMI Type: 08
Source: DIAGS
Error Code: 035-260-499-20050623-57-RAID Interface: Failed
Error Code: CPPRHTS1&1
Error Data: (Adapter in slot 4; internal error)
Error Data:
3) the day of the crash I unplugged and rebooted the system, to which the system log wrote that the backplane couldn't be found, although I don't know if that means it's actually dead; I think it shows that the LED didn't work for it, but I don't know if that means there's a problem with the LED or the actual backplane (maybe someone can tell the difference from the entry):
Entry Number: 1
Date/Time: 2005/06/26 16:07:15
DMI Type: 08
Source: DIAGS
Error Code: 180-357-000-20050626-94-Real-time Status Displays:
Error Code: Failed LED&1
Error Data: (Hard Drive backplane not found)
Error Data:
From what I combed out of IBM's documentation if I try to rebuild the array when the source of the problem is actually something like the backplane then I could lose all my data. (But of course the documentation suggests sending the logs into IBM to decode. Ack!) If anyone can read the error codes, or give me some wisdom on what steps to take it would be greatly appreciated. Thanks very much.