Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Netfinity 5000 Crash - backplane or ServeRAID problem?

Status
Not open for further replies.

Ritosh

Technical User
Jun 6, 2004
8
CA

I started this thread under IBM Servers, but haven't had any responses, so it's probably too general a heading for posting there. I can't stress enough how enormously appreciative I will be for any help on this...

I have three IBM Netfinity servers peer-to-peer networked, all using Windows 2000 Pro. Recently my Netfinity 5000 (8659-22Y) crashed. I have three 18.2 GB drives in an array at RAID-5. There are two additional 18.2 GB drives slotted in, but they're not part of the array, only been sitting there at Ready, as I bought them later and haven't got around to working them in yet.

On the surface indications point to the three drives in the array, as all are now defunct (so the logical drive is Offline, which makes troubleshooting more of a nighmare), the three front lights now being at amber. But I think the server set them to DDD as a precaution since something else has failed (besides, the odds of all three dying at the same time seems a slight stretch). What exactly the source problem is I'm not certain, although I suspect it's the backplane, or possibly the ServerRAID controller (3L Ultra II, BIOS 6.11.07), although elements of both seem functional, which makes the situation especially confusing.

Just as confusing is that the system error log, and ServeRAID dump log aren't clear, at least to me, about the source issue(s). I don't know how to interpret the log error codes, can't find any place on the web that can help, and IBM support is too expensive to consider. I haven't backed up the server for awhile, so am eager to rebuild the array, if possible, but am trying to be methodical so not to lose any data.

Up until the crash everything seemed to be working properly -- nothing out of the ordinary to report. The issues that stand out from the logs are 1) that at some point one of the two drives at Ready (the fifth one, at ID 4) was flagged PFA, although I never noticed any error message, and don't know how that would contribute to the problem, since it was not in the array; I suspect it may be symptomatic though; 2) according to the system diagnostics log, three days before the crash there were two entries, one after the other, that indicate a ServeRAID controller failure, or "internal error":

Entry Number: 22
Date/Time: 2005/06/23 22:04:40
DMI Type: 08
Source: DIAGS
Error Code: 035-260-499-20050623-57-RAID Interface: Failed
Error Code: CPPRHTS1&2
Error Data: (Adapter in slot 4; internal error)
Error Data:


Entry Number: 23
Date/Time: 2005/06/23 22:03:41
DMI Type: 08
Source: DIAGS
Error Code: 035-260-499-20050623-57-RAID Interface: Failed
Error Code: CPPRHTS1&1
Error Data: (Adapter in slot 4; internal error)
Error Data:

3) after the crash I unplugged then rebooted the server, to which the system log wrote that the backplane couldn't be found, although I don't know if that means it's actually toast; I think the log is showing that the LED didn't respond, but I don't know if that means there's a problem with the LED response or the actual backplane (maybe someone can tell the difference from the entry):

Entry Number: 1
Date/Time: 2005/06/26 16:07:15
DMI Type: 08
Source: DIAGS
Error Code: 180-357-000-20050626-94-Real-time Status Displays:
Error Code: Failed LED&1
Error Data: (Hard Drive backplane not found)
Error Data:

From what I combed out of IBM's documentation if I try to rebuild the array when the source of the problem is actually something like the backplane then I could lose all my data. (But of course the documentation suggests sending the logs into IBM to decode. Ack! -- I simply don't have the cashflow for that venture.) If anyone can offer any kind of assistance I would be extremely greatful.

Thanks very much.
 
well if you can find another raid adapter of the same kind, you can install it, hook up the array, and import the array from drives to the adapter, if it works, you had a bad adapter, backplane failures are rare, but there is a sensor on some of the servers that tell the system what orientation the system is in, rack, or tower and auto-addresses the scsi backplane. It gets dirty,but that is usually on the power switch board. Also since the other 2 drives are free, use servraid cd and right a small array to them, this will also test the raid card. do you get the option to press F2 and run diags? And as far as the logs go, in the field we have to send them to IBM support also, they don't give us the tools in the field.
 

Thanks very much for your input -- it's greatly appreciated. I just scrounged up a ServeRAID controller so will see if that's the issue. I'm still a little suspicous about the backplane, as I have two dogs and recently when I went to clean out the hard drive cage, low and behold, against the backplane was a bunch of dog hair, despite my cleaning out the server four months before. Needless to say, the dogs no longer are allowed in the room. I'll update you with the results. Thanks again.
 
Possible you had a disk with an error, not picked up or acted upon by the array adapter, causing other disks to offline/fail. See my last post on this thread. I would get the lastest firmware for the adapter

........................................
Chernobyl disaster..a must see pictorial
 
I just wanted to thank rclarke250 and technome for your input, and to let you know the server is back up: the problem was the ServeRAID card. Once I put in a new one rebuilding began; there were no problem drives, and all data was intact. Thanks again.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top