Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

error BFE4C025 on IBM 9117-570

Status
Not open for further replies.

HelgeZZ

IS-IT--Management
Mar 6, 2007
1
UA
Hi, ALL)
I have the error BFE4C025 on IBM 9117-570 (10 LPARs + 2 VIOS, DB/2 on each LPAR)
Please help)
siegfrid:[/]# prtconf
System Model: IBM,9117-570
Machine Serial Number: 107669C
Processor Type: PowerPC_POWER5
Number Of Processors: 4
Processor Clock Speed: 1902 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
LPAR Info: 11 SIEGFRID
Memory Size: 4096 MB
Good Memory Size: 4096 MB
Platform Firmware level: Not Available
Firmware Version: IBM,SF240_202
Console Login: enable
Auto Restart: true
Full Core: false


siegfrid:[/]# lsmcode -r
system:SF240_202 (t) SF240_202 (p) SF240_202 (t)
siegfrid:[/]# oslevel -r
5300-04)
siegfrid:[/]# instfix -i | grep ML
All filesets for 5.3.0.0_AIX_ML were found.
All filesets for 5300-01_AIX_ML were found.
All filesets for 5300-02_AIX_ML were found.
All filesets for 5300-03_AIX_ML were found.
All filesets for 5300-04_AIX_ML were found.

siegfrid:[/]# errpt -aD -s 0302090007 -e 0302110007 | more
---------------------------------------------------------------------------
LABEL: SCAN_ERROR_CHRP
IDENTIFIER: BFE4C025

Date/Time: Fri Mar 2 10:55:30 WET 2007
Sequence Number: 205
Machine Id: 00C7669C4C00
Node Id: siegfrid
Class: H
Type: PERM
Resource Name: sysplanar0
Resource Class: planar
Resource Type: sysplanar_rspc
Location:

Description
UNDETERMINED ERROR

Failure Causes
UNDETERMINED

Recommended Actions
RUN SYSTEM DIAGNOSTICS.
I run diag on this device. Please provide me with
steps and commands (if needed) to check/resolve this issue.
Thanks in advance
have a P670 lpar running 5.3 ML04. It seems that the server rebooted itself. Here is part of the error message reported in errpt & console at reboot.


Detail Data
PROBLEM DATA
0644 00E0 0000 00DC C600 8E00 0000 0000 0000 0000 4942 4D00 5048 0030 0100 494F
2007 0302 0855 5235 2007 0302 0855 5289 4800 0104 0000 0000 0000 0000 0000 0000
8200 02EA 5013 057F 5548 0018 0100 0000 9101 4000 0000 0000 0000 A000 0000 0000
5053 0068 0100 494F 0201 0003 0000 0060 0340 0000 0000 0007 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 4237 3030 4631 3034 2020 2020 2020 2020
2020 2020 2020 2020 2020 2020 2020 2020 C000 0006 1428 4D04 0000 0000 4944 0CC8
5356 4344 4F43 5300 4D54 001C 0100 494F 3931 3137 2D35 3730 3130 3736 3639 4320
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000

Diagnostic Analysis
Diagnostic Log sequence number: 48
Resource tested: sysplanar0
Resource Description: System Planar
Location:
SRC: B700F104
Description: Software Unrecovered Error, general. Refer to the
system service documentation for more information.
Additional Words: 2-03400000 3-00000007 4-00000000 5-00000000
6-00000000 7-00000000 8-00000000 9-00000000
Possible FRUs:
Priority: M FRU: SVCDOCS
Location: n/a


errpt -j BFE4C025

LPAR IDENTIFIER TIMESTAMP MMDDhhmmYY T C RESOURCE_NAME DESCRIPTION
SIEGFRID BFE4C025 0302105407 P H sysplanar0 UNDETERMINED ERROR
FUFFNIR BFE4C025 0302105407 P H sysplanar0 UNDETERMINED ERROR
ODIN BFE4C025 0302105507 P H sysplanar0 UNDETERMINED ERROR
GUDRUN BFE4C025 0302105407 P H sysplanar0 UNDETERMINED ERROR
MIDGARD BFE4C025 0302105307 P H sysplanar0 UNDETERMINED ERROR
GIMLY BFE4C025 0302105307 P H sysplanar0 UNDETERMINED ERROR

BRUNGHILD BFE4C025 0302105407 P H sysplanar0 UNDETERMINED ERROR
ARAGORN BFE4C025 0302105307 P H sysplanar0 UNDETERMINED ERROR
ARVEN BFE4C025 0302105707 P H sysplanar0 UNDETERMINED ERROR
GENDALF BFE4C025 0302105507 P H sysplanar0 UNDETERMINED ERROR
VIO1-1 BFE4C025 0302025507 P H sysplanar0 UNDETERMINED ERROR
VIO1-2 BFE4C025 0302025407 P H sysplanar0 UNDETERMINED ERROR

 
I know for sure that DukeSSD is good in translating the Problem Data section and he might be of a better help than me! But i think that this is a hardware problem and you better call IBM for that!

By googling around i found this link.


I hope you find it useful.

Regards,
Khalid
 
A SCAN_ERR_CHRP can mean almost anything, from a CPU, memory or adapter failure to a simple loss of communications with the HMC because you had a network porblem or simply rebooted the HMC.

You did the right thing and ran diags, diags have told you the problem details and the SRC - service request code, B700F104, as well as the extended words.

A SCAN_ERR_CHRP is usually a problem noticed by the platform firmware and or the service processor and is typically, as in this case, written out to all running instances of AIX (LPARs) and should also be written out to an AIX instance (LPAR) when it boots if it has not already logged the error.

You will typically find an associated error in the service processor error log, this is what is written out to the AIX errpt and why all LPARs log the error.

So now you need to know where to look up your SRC to get some more information.

For power4 and earlier machines look in the service guide:

You said "have a P670 lpar" - power4, but from the other info I think you meant p570 - power5.

For power5 look in the hardware infocentre:

And for B7xx F104 we get:

Reference Code:
F104

Description/Action Perform all actions before exchanging Failing Items:

Operating System error
Platform Licensed Internal Code terminated a partition.

If SRC word 3 is 0007, then a user may have initiated a function 22 prior to the operating system completing the IPL. If a function 22 was not performed, or if SRC word 3 is not 0007, look for other serviceable errors which occurred at same time frame.


Failing Item:

SVCDOCS



Well, word 3 is 0007 so as function 22 is a partition dump (search function 22 in the left hand search box) it looks like one of your partitions either had a forced shutdown, maybe from the HMC or the sysdumpstart command or a partition crashed and dumped.

So now you need to clean up and find the partition with the problem.

On each LPAR run sysdumpdev -L to see if it has dumped recently, if it has and you want to know why you can either learn all about the kdb command and analyse the dump your self:

Or call IBM AIX support and have them do it for you.

You should also run advanced diags in system verification to sysplanar0 and assuming they only report the existing error but otherwise pass, log a repair action on each LPAR against sysplanar0, or clear the errpt of this error, otherwise the diag cron job that runs at 4am every day will hassle you about this permanent hardware error for the next 90 days.
 
upgrade microcode to latest fixes many bogus errors
 
Be aware that if you decide to upgrade the server's firmware to latest level, there is a prereq of HMC level (5.2.1 or 6.1). Go through the readmes and installation docs thoroughly!


HTH,

p5wizard
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top