BCM450 in continual reboot cycle.
Various LEDs indicated booting, however would settle on
Top LED = orange solid
Bottom LED = solid green
These indicate kernal mode (8 seconds) then safe OS.
Then reboot again.
Briefly I could PING 10.10.11.1 through the OAM port.
Any effort to boot to Main OS (using serial connection) would fail (including both boot to Main OS and transition to Main OS), and BCM would boot and always return to safe mode.
All MBMs flashing up like they are receiving power. No symptoms of a power problem whatsoever.
Putty indicated following (this is just an extract trying to force to Main OS, however similar messages seen just on power cycle):
<BEGIN>
You have selected Transition to Main OS.
Do you wish to proceed? [Y/N]? : yUnlocking flash...Done
Erasing flash...Done
Writing environment to /dev/mtd3...Done
Locking ...DoneMain OS
[H[2JSwapping to mainos
Entering read_request...
mainos
kjournald starting. Commit interval 5 seconds
EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,26), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Unlocking flash...Done
Erasing flash...Done
Writing environment to /dev/mtd2...Done
Locking ...DoneUnlocking flash...Done
Warning: DQ5 raised while erase operation was in progress, but erase completed OK
Erasing flash...Done
Writing environment to /dev/mtd3...Done
Locking ...Donereset_request=mainos
Entering mainos_chroot...
umount: /dev/sdb3: not mounted
Stopping portmap...
Stopping portmapper: [ OK ]
Stopping sshd...
[ OK ]
Stopping network...
Shutting down interface eth1: [ OK ]
Shutting down interface eth2: [ OK ]
Shutting down loopback interface: [ OK ]
stopping syslog...
Shutting down kernel logger: [ OK ]
Shutting down system logger: [ OK ]
Unloading modules...
Removing kernel loadable modules, if possible and if unused.output from nn_gpiol driver exiting
output from nn_gpiol driver exiting
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,19), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
modprobe: modprobe: Can't open dependencies file /lib/modules/2.4.22-ncgl-11.45.1.0/modules.dep (No such file or directory)
INIT: version 2.84 booting
Welcome to NCGL NCGL 2005 (cjdavola)
Press 'I' to enter interactive startup.
Mounting proc filesystem: [ OK ]
Unmounting initrd: umount: /initrd: device is busy
[FAILED]
Setting hostname GOGPBCM450: [ OK ]
Your system appears to have shut down uncleanly
Press Y within 5 seconds to force file system integrity check...
Press Y within 4 seconds to force file system integrity check...
Press Y within 3 seconds to force file system integrity check...
Press Y within 2 seconds to force file system integrity check...
Press Y within 1 seconds to force file system integrity check...
Activating swap partitions: [ OK ]
Finding module dependencies: depmod: Can't open /lib/modules/2.4.22-ncgl-11.45.1.0/modules.dep for writing
[FAILED]
Mounting local filesystems: [ OK ]
Enabling swap space: [ OK ]
modprobe: Can't open dependencies file /lib/modules/2.4.22-ncgl-11.45.1.0/modules.dep (No such file or directory)
/sbin/ldconfig: /nn/lib/libcppunit-1.12.so.1 is not a symbolic link
INIT: Entering runlevel: 3
Entering non-interactive startup
Starting init_cleanup: [ OK ]
Starting system logger: [ OK ]
Starting kernel logger: [ OK ]
Starting nn_modules: [ OK ]
Setting network parameters: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: [ OK ]
Bringing up interface eth1: [ OK ]
Bringing up interface eth2: [ OK ]
Bringing up interface eth3: [ OK ]
Bringing up interface eth4: [ OK ]
Bringing up interface eth5: [ OK ]
Bringing up interface eth6: [ OK ]
modprobe: Can't open dependencies file /lib/modules/2.4.22-ncgl-11.45.1.0/modules.dep (No such file or directory)
Starting network_test: [ OK ]
Starting dhcpd: [ OK ]
Starting efs: [ OK ]
Initializing random number generator: [ OK ]
Starting xinetd: [ OK ]
[ OK ]
Starting UPS monitoring:[ OK ]
Starting DiaLogger: [ OK ]
Starting Pdrd: [ OK ]
Starting enigma: [ OK ]
Starting Pdrd_db_init: [ OK ]
Starting hc_sanity.450.1: [ OK ]
Starting invmgr: Checking dependencies for invmgr.
[FAILED]
<END>
Most notable is failure to start invmgr (I assume inventory manager). I assume the BCM was trying to power up the MBMs and running into a problem, causing the invmgr service to stall, so forcing the BCM into safe mode.
Despite changing the customer's HDD (thinking there was a corruption) and changing the BFT, still problem occurred. I had a replacement unit shipped to site, powered it on (using a different power lead from the comms cabinet) using the customers' BFT & HDD, and all okay. This suggested problem with the PSU / CIF / MBM backplane.
However on rack mounting the BCM450, and using the customer's existing power chord, I had more problems. Traced the power lead to a UPS. Replaced power chord with one straight to raw mains, and all problems resolved.
I.e. Faulty UPS or faulty UPS cable was the root cause of all problems.
Where I ordinarily associate UPS to be a BCM's best friend, this has been a painful lesson that a faulty one can equally be their worst enemy causing system wide BCM problems.
Whenever I have a reboot problem in the future, even if the BCM looks like it is receiving power to all MBMs etc, if I trace the power lead to UPS, first thing I will do is go straight to raw mains and see if this resolves.
Hope this helps any other engineers in a similar situation.
regards
paul
Various LEDs indicated booting, however would settle on
Top LED = orange solid
Bottom LED = solid green
These indicate kernal mode (8 seconds) then safe OS.
Then reboot again.
Briefly I could PING 10.10.11.1 through the OAM port.
Any effort to boot to Main OS (using serial connection) would fail (including both boot to Main OS and transition to Main OS), and BCM would boot and always return to safe mode.
All MBMs flashing up like they are receiving power. No symptoms of a power problem whatsoever.
Putty indicated following (this is just an extract trying to force to Main OS, however similar messages seen just on power cycle):
<BEGIN>
You have selected Transition to Main OS.
Do you wish to proceed? [Y/N]? : yUnlocking flash...Done
Erasing flash...Done
Writing environment to /dev/mtd3...Done
Locking ...DoneMain OS
[H[2JSwapping to mainos
Entering read_request...
mainos
kjournald starting. Commit interval 5 seconds
EXT3-fs warning: maximal mount count reached, running e2fsck is recommended
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,26), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
Unlocking flash...Done
Erasing flash...Done
Writing environment to /dev/mtd2...Done
Locking ...DoneUnlocking flash...Done
Warning: DQ5 raised while erase operation was in progress, but erase completed OK
Erasing flash...Done
Writing environment to /dev/mtd3...Done
Locking ...Donereset_request=mainos
Entering mainos_chroot...
umount: /dev/sdb3: not mounted
Stopping portmap...
Stopping portmapper: [ OK ]
Stopping sshd...
[ OK ]
Stopping network...
Shutting down interface eth1: [ OK ]
Shutting down interface eth2: [ OK ]
Shutting down loopback interface: [ OK ]
stopping syslog...
Shutting down kernel logger: [ OK ]
Shutting down system logger: [ OK ]
Unloading modules...
Removing kernel loadable modules, if possible and if unused.output from nn_gpiol driver exiting
output from nn_gpiol driver exiting
kjournald starting. Commit interval 5 seconds
EXT3 FS 2.4-0.9.19, 19 August 2002 on sd(8,19), internal journal
EXT3-fs: mounted filesystem with ordered data mode.
modprobe: modprobe: Can't open dependencies file /lib/modules/2.4.22-ncgl-11.45.1.0/modules.dep (No such file or directory)
INIT: version 2.84 booting
Welcome to NCGL NCGL 2005 (cjdavola)
Press 'I' to enter interactive startup.
Mounting proc filesystem: [ OK ]
Unmounting initrd: umount: /initrd: device is busy
[FAILED]
Setting hostname GOGPBCM450: [ OK ]
Your system appears to have shut down uncleanly
Press Y within 5 seconds to force file system integrity check...
Press Y within 4 seconds to force file system integrity check...
Press Y within 3 seconds to force file system integrity check...
Press Y within 2 seconds to force file system integrity check...
Press Y within 1 seconds to force file system integrity check...
Activating swap partitions: [ OK ]
Finding module dependencies: depmod: Can't open /lib/modules/2.4.22-ncgl-11.45.1.0/modules.dep for writing
[FAILED]
Mounting local filesystems: [ OK ]
Enabling swap space: [ OK ]
modprobe: Can't open dependencies file /lib/modules/2.4.22-ncgl-11.45.1.0/modules.dep (No such file or directory)
/sbin/ldconfig: /nn/lib/libcppunit-1.12.so.1 is not a symbolic link
INIT: Entering runlevel: 3
Entering non-interactive startup
Starting init_cleanup: [ OK ]
Starting system logger: [ OK ]
Starting kernel logger: [ OK ]
Starting nn_modules: [ OK ]
Setting network parameters: [ OK ]
Bringing up loopback interface: [ OK ]
Bringing up interface eth0: [ OK ]
Bringing up interface eth1: [ OK ]
Bringing up interface eth2: [ OK ]
Bringing up interface eth3: [ OK ]
Bringing up interface eth4: [ OK ]
Bringing up interface eth5: [ OK ]
Bringing up interface eth6: [ OK ]
modprobe: Can't open dependencies file /lib/modules/2.4.22-ncgl-11.45.1.0/modules.dep (No such file or directory)
Starting network_test: [ OK ]
Starting dhcpd: [ OK ]
Starting efs: [ OK ]
Initializing random number generator: [ OK ]
Starting xinetd: [ OK ]
[ OK ]
Starting UPS monitoring:[ OK ]
Starting DiaLogger: [ OK ]
Starting Pdrd: [ OK ]
Starting enigma: [ OK ]
Starting Pdrd_db_init: [ OK ]
Starting hc_sanity.450.1: [ OK ]
Starting invmgr: Checking dependencies for invmgr.
[FAILED]
<END>
Most notable is failure to start invmgr (I assume inventory manager). I assume the BCM was trying to power up the MBMs and running into a problem, causing the invmgr service to stall, so forcing the BCM into safe mode.
Despite changing the customer's HDD (thinking there was a corruption) and changing the BFT, still problem occurred. I had a replacement unit shipped to site, powered it on (using a different power lead from the comms cabinet) using the customers' BFT & HDD, and all okay. This suggested problem with the PSU / CIF / MBM backplane.
However on rack mounting the BCM450, and using the customer's existing power chord, I had more problems. Traced the power lead to a UPS. Replaced power chord with one straight to raw mains, and all problems resolved.
I.e. Faulty UPS or faulty UPS cable was the root cause of all problems.
Where I ordinarily associate UPS to be a BCM's best friend, this has been a painful lesson that a faulty one can equally be their worst enemy causing system wide BCM problems.
Whenever I have a reboot problem in the future, even if the BCM looks like it is receiving power to all MBMs etc, if I trace the power lead to UPS, first thing I will do is go straight to raw mains and see if this resolves.
Hope this helps any other engineers in a similar situation.
regards
paul