I started this thread under IBM Servers, but haven't had any responses, so it's probably too general a heading for posting there. I can't stress enough how enormously appreciative I will be for any help on this...
I have three IBM Netfinity servers peer-to-peer networked, all using Windows 2000 Pro. Recently my Netfinity 5000 (8659-22Y) crashed. I have three 18.2 GB drives in an array at RAID-5. There are two additional 18.2 GB drives slotted in, but they're not part of the array, only been sitting there at Ready, as I bought them later and haven't got around to working them in yet.
On the surface indications point to the three drives in the array, as all are now defunct (so the logical drive is Offline, which makes troubleshooting more of a nighmare), the three front lights now being at amber. But I think the server set them to DDD as a precaution since something else has failed (besides, the odds of all three dying at the same time seems a slight stretch). What exactly the source problem is I'm not certain, although I suspect it's the backplane, or possibly the ServerRAID controller (3L Ultra II, BIOS 6.11.07), although elements of both seem functional, which makes the situation especially confusing.
Just as confusing is that the system error log, and ServeRAID dump log aren't clear, at least to me, about the source issue(s). I don't know how to interpret the log error codes, can't find any place on the web that can help, and IBM support is too expensive to consider. I haven't backed up the server for awhile, so am eager to rebuild the array, if possible, but am trying to be methodical so not to lose any data.
Up until the crash everything seemed to be working properly -- nothing out of the ordinary to report. The issues that stand out from the logs are 1) that at some point one of the two drives at Ready (the fifth one, at ID 4) was flagged PFA, although I never noticed any error message, and don't know how that would contribute to the problem, since it was not in the array; I suspect it may be symptomatic though; 2) according to the system diagnostics log, three days before the crash there were two entries, one after the other, that indicate a ServeRAID controller failure, or "internal error":
Entry Number: 22
Date/Time: 2005/06/23 22:04:40
DMI Type: 08
Source: DIAGS
Error Code: 035-260-499-20050623-57-RAID Interface: Failed
Error Code: CPPRHTS1&2
Error Data: (Adapter in slot 4; internal error)
Error Data:
Entry Number: 23
Date/Time: 2005/06/23 22:03:41
DMI Type: 08
Source: DIAGS
Error Code: 035-260-499-20050623-57-RAID Interface: Failed
Error Code: CPPRHTS1&1
Error Data: (Adapter in slot 4; internal error)
Error Data:
3) after the crash I unplugged then rebooted the server, to which the system log wrote that the backplane couldn't be found, although I don't know if that means it's actually toast; I think the log is showing that the LED didn't respond, but I don't know if that means there's a problem with the LED response or the actual backplane (maybe someone can tell the difference from the entry):
Entry Number: 1
Date/Time: 2005/06/26 16:07:15
DMI Type: 08
Source: DIAGS
Error Code: 180-357-000-20050626-94-Real-time Status Displays:
Error Code: Failed LED&1
Error Data: (Hard Drive backplane not found)
Error Data:
From what I combed out of IBM's documentation if I try to rebuild the array when the source of the problem is actually something like the backplane then I could lose all my data. (But of course the documentation suggests sending the logs into IBM to decode. Ack! -- I simply don't have the cashflow for that venture.) If anyone can offer any kind of assistance I would be extremely greatful.
Thanks very much.