Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

FSC broke down 2

Status
Not open for further replies.

Iago77

IS-IT--Management
Jun 9, 2003
125
ES
Hello,

a couple the days ago I noticed a error in our AIX 5.1 box (sorry but I'm afraid AIX has translated to spanish the sofware):

# errpt -a -j F5ED80EF

Núm. secuencia: 171835
ID máquina: 005FDB1A4C00
ID nodo: saturno
Clase: H
Tipo: PERM
Nombre recurso: fcs1
Clase de Recurso: adapter
Tipo de Recurso: df1000f9
Ubicación: 34-08
VPD:
Número de Pieza.............00P4494
Nivel EC....................A
Número de Serie.............1D3200C3E1
Fabricante..................001D
Número de FRU............... 00P4495
Dirección de la Red.........10000000C9338E28
Nivel e ID de ROS...........02C03951
Especif. Dispositivo (Z0)...2002606D
Especif. Dispositivo (Z1)...00000000
Especif. Dispositivo (Z2)...00000000
Especif. Dispositivo (Z3)...03000909
Especif. Dispositivo (Z4)...FF401210
Especif. Dispositivo (Z5)...02C03951
Especif. Dispositivo (Z6)...06433951
Especif. Dispositivo (Z7)...07433951
Especif. Dispositivo (Z8)...20000000C9338E28
Especif. Dispositivo (Z9)...CS3.91A1
Especif. Dispositivo (ZA)...C1D3.91A1
Especif. Dispositivo (ZB)...C2D3.91A1

Descripción
ADAPTER ERROR

Acciones recomendadas
EFECTUAR LOS PROCEDIMIENTOS PARA LA DETERMINACIÓN DE PROBLEMAS

Datos de detalle
DATOS DE DETECCIÓN
0000 0000 0000 0030 0302 0006 0000 0000 0001 1D00 0000 9A06 0000 33EE 0000 012C 0000 0001 0000 0002 0000 0000 0000 0000 0000 000D 0000 0000 0000 0001 0000 0000 0000 0000 0608 0000 0000 0010 0000 0000 0000 0000 0000 2710 0000 07D0 0000 076C 0000 0064 0000 000F 4000 0000 02A0 0147 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Análisis de diagnóstico
Número de secuencia del registro cronológico de diagnóstico: 26961
Recurso probado: fcs1
Descripción del recurso:Adaptador FC
Ubicación: U0.1-P1-I11/Q1
SRN: 447-706
Descripción: A fatal hardware error has occurred. This adapter
was successfully taken off-line. It will remain
off-line until reconfigured or the system is
rebooted. This adapter must be replaced and not
brought back on-line. Failure to adhere to this
action could result in data being read or written
incorrectly or in the loss of data.
FRU posibles:
fcs1 FRU: 00P4495 U0.1-P1-I11/Q1
Adaptador FC

I've tried to remove that device from smitty, but it says device is in use.

Is there any way to disable/enable this device?

How can I troubleshoot this kind of problem?

Would it be enough to do an IPL in the machine?

Would it be possible determining if the fsc is broke down?

Many thanks in advance,

Iago
 
What is the output of

lsdev -C | grep fcs

lsdev -C -F parent -l fcs1

regards,
khalid
 
have a look to state of:

fcsX
fcnetX
fscsiX

they have to be all set to defined
 
It might be something wrong with the fc itself as this is usually the case (it is very easy to be broken as it is sensitive)

but to prove that you have first to check the status of the fcs (lsdev -C | grep fcs)

then you have to trace the parent of this fcs using (lsdev -C -F parent -l fcs1)
ax
then lsdev -C -F parent -l ax

and so on

regards,
Khalid
 
Here I attach the info:

#lsdev -C | grep fcs
fcs0 Disponible 1A-08 Adaptador FC
fcs1 Disponible 34-08 Adaptador FC
fcs2 Disponible 37-08 Adaptador FC

#lsdev -C -F parent -l fcs1
pci14

#lsdev -C -F parent -l ax
(no output)

Finally I succeed in removing the device. Maybe there is a bug in HDLM driver or software:

I executed:

# dlmhbadel fscsi1

and then with smitty all the devices arised again

But currently, dlfdrv1 does not show:

hdisk0 005fdb1a24d50083 rootvg
hdisk1 005fdb1ab0c62fe7 rootvg
hdisk2 005fdb1a64256e45 explvg
hdisk3 005fdb1ab0c67711 explvg
hdisk4 none None
hdisk8 none None
dlmfdrv0 005fdb1a99721a03 hitachivg
hdisk5 none None
hdisk6 none None
hdisk7 none None

And:

#dlnkmgr view -drv
dlnkmgr view -drv
PathID HDevName Device LDEV
000000 dlmfdrv0 hdisk4 USP.0022657.1007
000001 dlmfdrv0 hdisk5 USP.0022657.1007
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2007/01/30 18:00:15

#dlnkmgr view -path
dlnkmgr view -path
Paths:000002 OnlinePaths:000002
PathStatus IO-Count IO-Errors
Online 3940583 0

PathID PathName DskName iLU ChaPort Status Type IO-Count IO-Errors DNum HDevName
000000 08.34.0000000000010500.0000 HITACHI .OPEN-V .0022657 1007 5A Online Own 0 0 0 dlmfdrv0
000001 08.1A.0000000000020500.0000 HITACHI .OPEN-V .0022657 1007 6A Online Own 3940583 0 0 dlmfdrv0
KAPL01001-I The HDLM command completed normally. Operation name = view, completion time = 2007/01/30 18:00:46

Before the error, I had another dlmfdrv1 with multipath.

If I do:

# dlmcfgmgr

Only detects dlmfdrv0.

Is there any way to rediscover it again?
 
So you have Hittachi disks

By the way, when i said try this command (lsdev -C -F parent -l ax) i meant ax as an example!

It should then be lsdev -C -F parent -l pci14 after doing lsdev -C -F parent -l fcs1

then check for the availability of pci14 using lsdev -C | grep pci14

I don't know what is this now! (Disponible) it must be spanish! and i guess it means avaliable! so with this i guess then you should easily use the fcs1 unless there is a hardware problem with it!

Are you sure the fiber is connected?

and by the way what is this dlfdrv1? do you mean a volume group with that name not available? It must be pathing problem i guess

regards,
Khalid
 
A fatal hardware error has occurred. This adapter was successfully taken off-line. It will remain off-line until reconfigured or the system is rebooted. This adapter must be replaced and not brought back on-line. Failure to adhere to this action could result in data being read or written incorrectly or in the loss of data."

Seems to say it all, the adapter is broken and if you continue to use it your data is at risk.
 
khalidaaa you are correct.

Really the disks are available in this moment. The only problem is about viewing dlmfdrv1, a Hitachi disk (HDLM).

Fiber wire is connected so maybe the problem is in HDLM driver.

I'm going to find out the reason...

 
No DukeSSD is right:

"A fatal hardware error has occurred. This adapter was successfully taken off-line. It will remain off-line until reconfigured or the system is rebooted. This adapter must be replaced and not brought back on-line. Failure to adhere to this action could result in data being read or written incorrectly or in the loss of data."

If you do not understand this text, get it translated for you. I think it is too dangerous to keep on using this adapter. You risk data corruption. Have the adapter and the server checked out by IBM, at least open a hardware call with IBM support.


HTH,

p5wizard
 
I would have to agree with DukeSSD and p5wizard as well! It is always better to have IBM on the line for such hardware errors!

as i mentioned in one of my replies above! i guess it could be a fiber hardware problem!

Thanks for the star Iago77

regards,
Khalid
 
Ok, I'll put in practise your piece of advice.

I also think it is the best way to avoid problems in the future.

Thanks to all of you.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top