Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chris Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

7133 subsystems failed!

Status
Not open for further replies.

xiayd

IS-IT--Management
Mar 27, 2006
16
CN
we have one S70 server and 7133, aix 4.3.3 ml11 installed,

now errpt :


BD797922 0326150006 P H enclosure0 SUBSYSTEM FAILURE
BD797922 0326140006 P H enclosure0 SUBSYSTEM FAILURE
BD797922 0326130006 P H enclosure0 SUBSYSTEM FAILURE
BD797922 0326120006 P H enclosure0 SUBSYSTEM FAILURE
BD797922 0326110006 P H enclosure0 SUBSYSTEM FAILURE
BD797922 0326100006 P H enclosure0 SUBSYSTEM FAILURE

# errpt -aj BD797922|pg
---------------------------------------------------------------------------
LABEL: SSA_ENCL_ERR1
IDENTIFIER: BD797922

Date/Time: Sun Mar 26 15:00:13
Sequence Number: 12786
Machine Id: 000935014C00
Node Id: hn1
Class: H
Type: PERM
Resource Name: enclosure0
Resource Class: container
Resource Type: ses
Location: 00-00-224A
VPD:
Part Number.................9L1850
Serial Number...............AC14224A
EC Level....................000000R000
Manufacturer................IBM053
ROS Level and ID............0009
Device Specific.(Z0)........DISPLAY=224A
Device Specific.(Z1)........BYPASS1_16= 09L5510
Device Specific.(Z2)........BYPASS4_5= 09L5510

:

Device Specific.(Z3)........BYPASS8_9= 09L5510
Device Specific.(Z4)........BYPASS12_13= 09L5510
Device Specific.(Z5)........FAN1=09L2794
Device Specific.(Z6)........FAN2=09L2794
Device Specific.(Z7)........FAN3=09L2794
Device Specific.(Z8)........PSU1=09L4299
Device Specific.(Z9)........PSU2=09L4299
Device Specific.(ZA)........CTRL= 09L2083
Device Specific.(ZB)........OPERATOR= 08L7924

Description
SUBSYSTEM FAILURE

Probable Causes
SUBSYSTEM

Failure Causes
SUBSYSTEM

Recommended Actions
PERFORM PROBLEM DETERMINATION PROCEDURES


:

Detail Data
SENSE DATA
0802 6000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
---------------------------------------------------------------------------
LABEL: SSA_ENCL_ERR1
IDENTIFIER: BD797922


the service guide said some slot empty,

but:


# lsdev -Cc disk
hdisk0 Available 30-68-00-0,0 16 Bit LVD SCSI Disk Drive
hdisk1 Available 30-68-00-1,0 16 Bit LVD SCSI Disk Drive
hdisk2 Available 20-68-L SSA Logical Disk Drive
hdisk3 Available 20-68-L SSA Logical Disk Drive
hdisk4 Available 20-68-L SSA Logical Disk Drive
hdisk5 Available 20-68-L SSA Logical Disk Drive
hdisk6 Available 20-68-L SSA Logical Disk Drive
hdisk7 Available 20-68-L SSA Logical Disk Drive
hdisk8 Available 20-68-L SSA Logical Disk Drive
hdisk9 Available 20-68-L SSA Logical Disk Drive
hdisk10 Available 20-68-L SSA Logical Disk Drive
hdisk11 Available 20-68-L SSA Logical Disk Drive
hdisk12 Available 20-68-L SSA Logical Disk Drive
hdisk13 Available 20-68-L SSA Logical Disk Drive
hdisk14 Available 20-68-L SSA Logical Disk Drive
hdisk15 Available 20-68-L SSA Logical Disk Drive
hdisk16 Available 20-68-L SSA Logical Disk Drive
hdisk17 Available 20-68-L SSA Logical Disk Drive

16 slots have disk installed!

so ?


 
well first all the date is not current it shows Mar (not current date) maybe the encolusre was power off in MAR so the error is an old one
 
try doing
Code:
lsdev -Cc pdisk

Code:
lsdev -Cc adapter


Code:
ssa_ela

HTH
 
looks like you have a bad backplane, could be a bad disk though.
the SSA SRN (service request number) is 80260, from the sense data and in the 7133 service guide
top of page 81 says you have an empty slot, use the service functions (see page 61) to find the slot.
As you have 16 disks in the drawer you know it is not an empty slot, and all the disks are available which leaves the backplane.
If there are no disk errors and / or the errors have stopped, put it dowm to experience.
If the errors continus then you may see some different SRNs which will give a better idea of the failing component.
Sadly you have nothing of use to go on at the moment.
 
i powered off the s70 and 7133, when reboot completed,the errors continue! and no any different SRNS.

this error had continued one year.

this is mail for root:

From root Sat Oct 8 01:01:29 2005
Date: Sat, 8 Oct 2005 01:01:29 +0800
From: root
To: root
Subject: diagela message from hn1

A PROBLEM WAS DETECTED ON Sat Oct 8 01:00:28 TAIST 2005 801014

The Service Request Number(s)/Probable Cause(s)
(causes are listed in descending order of probability):

80260: Use the Service Guide for your SSA Adapter or SSA Subsystem.
enclosure0 00-00-224A

803-80C: A software error was encountered while running the Diagnostic
Application on enclosure0.

Message 350:
From root Sun Oct 23 05:02:08 2005
Date: Sun, 23 Oct 2005 05:02:08 +0800
From: root
To: ssa_adm
Subject: enclosure0

Sun Oct 23 05:01:08 TAIST 2005
Error Log Analysis has detected error(s) that may require your attention.
enclosure0 SRN 80260 SSA Enclosure

 
Ok, so follow the service guide.

You KNOW there are 16 slots and 16 disks are reporting to AIX but that does not mean the SSA diags or the enclosure think there are 16 active, working disks installed.

As the service guide says:

Action: Use the service functions (see 'Service Functions' on page 61) to find the slot that has been reported empty.

So you fire up the 'Service Functions' and probably need to look at the 'Enclosure Configuration Information' and 'The contents (device, dummy device, or empty) of each position (slot) in the 7133'
and you should find the bad slot / disk.

It could be a bad disk - the first thing to try - that is reporting to AIX but is not reporting in correctly to the SSA diags, or if swapping the disk does not solve the problem then it could be the backplane (the thing the disk is plugged in to) that has a problem.

Of course it could be one of those anomalies caused by a mismatch of device drivers and firmware.

Have you installed all of the latest SSA related device drivers / filesets and the latest firmware on the SSA adapter, the 7133 enclosure and all of the SSA disks?

Is this the only disk drawer on the system? I have seen some odd errors when you try and mix the newer 7133-D/T40 drawers and the older 7133 - 1/2/5/600 drawers on the same loop or system.


 
thanks dukessd,

how can i start the service aids

or SSA remote system management?
 
I think you may have to update the SSA adapter and disk microcode. on a D40 or T40 type (newest type) SSA drawer, the disks should have a location code including the slot number in the drawer. Or perhaps this is only for the pdisks, I don't remember exactly.


HTH,

p5wizard
 
On SSA, hdisks only get the location code of the adapter and no vpd.
Old drawers don't give a slot specific pdisk location but the t/d40 drawers do, they identify the drawer and the slot.
SSA service aids can be found in diags and smitty devices, from memory the diags just send you to the smitty devices SSA menu, but check, there may be some handy SSA diag tools that may solve your problem. Check the diag task selection menu for SSA stuff.
From smitty devices scroll down to near the end and you will see SSA service aids - or similar.
If you are going to have to support SSA you should download the adapter and enclosure service guides, and the redbook, and get your head around te SSA consepts.
Bear in mind that SSA is dead. No longer being developed.
Hardly for sale, last I heard you could only buy full drawers or disks, no longer can you buy an empty or partially populated drawer and the software is only on new release or defect support. AIX 5.3 does fully support them but don't expect any new / bigger disks and 5.4 support is not guaranteed....
 
hmm, from the enclosure VPD in the error you posted, it is a D40 (D = Drawer, T = Tower - bound to be a drawer (rack mount) if it is attached to a 7017), 'cos it has 2 x PSU and 3 X fan. Because the older 7133s had 3 x combined PSU / Fan units and didn't display the VPD for both.
But I may have been wrong about the enclosure and disk locations on 4.3.3 - so long ago it may be that 4.3.3 does not report this even with a x40 drawer.
try an
lscfg -vl pdiskx
and see if the location code has extra info.
If it gives some thing like:
pdiskx 20-68-yyyy-zzp
then yyyy is the drawer identifier, what the display on the front of the drawer shows (led / orange display) and the zz is the slot number - 1-8 front (left to right), 8-16 back (left to right) - the display is on the front and the leads go in the back.
Looks like the enclosure firmware is 0009 - "ROS Level and ID............0009" - not good, this should be 0020....
If you cannot find the duff slot / disk then download and install all the latest drivers and firmware, firmware for the adapter, enclosure and each type of disk, then either the problem will go away, or the service aids will be able to identify the problem.
Being end of life most of the SSA subsystem is pretty sorted and if it is up to date (firmware, device drivers, etc.) then it is fully developed and robust disk subsystem, i.e. they have fixed all the problems and you have al the tools installed to sort out any problems.



 
So I was wrong, only the pdisks get the drawer/tower location number, not the hdisks...

"lsdev -Ccpdisk" would show all the SSA physical disks and their full location id including disk positions in the drawer (pdisks and hdisks generally don't have the same number because you may have SCSI disks in the box also), so don't be alarmed by that).

The enclosure firmware should be 0020, as DukeSDD already pointed out. So perhaps also your adapter and disk firmwares are outdated?



HTH,

p5wizard
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top