Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Continuously rebooting BCM 50 3.0 3

Status
Not open for further replies.

masterjim

Technical User
Jul 18, 2006
38
US
Hi All,

We have had several customers whose BCM 50 systems have "failed" by going into a reboot "loop" or continuously rebooting.
We have treated these systems as completely dead and we've replaced them.
Even a few reboots by unplugging the system for 10 minutes doesn't work.

But I'd like to know if there is any way to recover a system from this state.
Does anyone know what is actually happening in the system to cause continuous rebooting?

Thanks for any insight or help.
-Jim
 
yes most of the time its a bad hard drive

we usually keep some stock
 
It's rare for the motherboard to fail. There isn't any other way I know to get into the system other than via the LAN / OAM port.

I'm assuming that those systems that had failed were up to date for patching etc?.

All the best

Firebird Scrambler
Meridian 1 / Succession and BCM / Norstar Programmer in the UK

If it's working, then leave it alone!.
 
Thanks a lot for the replies.

Phoneguy610, do you mean a Nortel programmed HD.
Can you please tell me what's involved in swapping a HD? (I got the hardware part; it's the licensing and re-configuration I would need some guidance on. Is there a manual I can use for help?)

Thanks again, All.
-Jim
 
pretty easy i think 5 screws gets the 50 open

then you just swap drives

system id should remain the same so if you still have the original keycode file you can regenerate them

or call your avaya vendor with the system ID and they can get it done

unless you have a backup of old programming you will have to reprogram a brand new system

good luck [smile]
 
I have had this happen with several 50's. Matter of fact it is almost chronic in my opinion. I thought it was always a bad hard drive or just dead. I just had this happen to me recently and was so mad that I tried a couple of different things to bring it back. Tried multiple reboots, bringing it back without the amphrenol attached. Removing expansions and rebooting. Taking the network away and bringing it back up nothing worked. Switching power cords, and outlets. The only thing that worked was doing a level 2 reset and starting it up from scratch. Once it came back from that I loaded the backup that I had made before it crashed. The backup went on fine and the system has worked fine since. It seems to me if this happens hope you have a backup, and do a level 2 reset. At least you can try this before replacing the hard drive. Don't know why this happens to them, but in my opinion it is not always a bad hard drive. I still say something happens with a corrupt file on the hard drive that does not allow it to boot up. Not only that the most times I see it happen is after power outages. That seems to trip the condition to me. Just my opinion.
 
It funny you word it that way EX. I was far from home when this happened. Like you said I had nothing to lose with a fresh backup in hand. Like I stated above it worked, maybe I got lucky, but to anyone else that is in this situation if it looks like a drive replacement it is worth a shot. Just a side note that Level 2 reset is tricky, so follow the steps closely.
 
I personally think the golden rule here is to make the customer spend a few pennies and buy a USB stick to leave in the BCM and have it do a Monthly backup. At least this way, you can dig out the keycodes as well as do a full restore again.

It really does annoy me when they complain about no backups / keycodes etc as it is their responsibility in the first place.

All the best

Firebird Scrambler
Meridian 1 / Succession and BCM / Norstar Programmer in the UK

If it's working, then leave it alone!.
 
Just to add on. I had two 50's recently that have had this happen near the end of the business day. I was not able to get to site, so the system ran in the constant reboot stage overnight. In both cases the phone systems righted themselves overnight, with no ill effects. Don't know extactly how long it took for the systems to right themselves, but it is why I think something gets hungup in the boot up string. If you watch it long enough it is very rhytmic in the way it tries to right itself. I guess at worse if you let it run long enough it may right itself, but can't always say that it will happen. I compare it to a skipping record, but a little more complex than that. The level two reset is what seems to be the fix, if you don't have a couple of hours to wait. If that does not work then maybe a hard drive is needed.
 
I have had several do this and I have tried a Level 2 reset and have never had one come back. I have done Level 2 resets on other machines with success. Is there a trick to it when it won't boot up?

SHK Certified (School of Hard Knocks)
NCSS, ATSP/IP
 
LKey, for the customers with BCM50 systems that went into the reboot cycle but eventually came out of it, you should do - IMHO - three things:
1. Get a system backup ASAP
2. Get ready to replace hard disks in these units
3. Do not reboot these systems if at all possible :)

The primary reason for the reboot cycle (my guess over 99% of all cases) is some kind of hard disk failure. If the hard disk is dead, nothing but a replacement is going to help. If the hard disk is deteriorating and the system encounters problems reading sectors with important components (for example the kernel), there is a chance that the sector will be read successfully once in a number of attempts. The result is that the system will come up and work fine - but beware - on the next reboot when the sector(s) are read again, you may not be so lucky.

If you have disk cloning software that can make exact copies of existing disks, copying the current disks (that appear to be approaching their life expectancy) to new hard disks could save you some $$$ comparing to the multi-image FRU disks.
 
Regarding the case when L2 reset recovered a failed system, the reason was most likely an unreadable file. The L2 reset boots another OS (which is on another partition on the hard disk) and replaces configuration files used by the primary OS. If one of these files was unreadable, then the L2 procedure would replace it and the system would come up again.

In fact, the L1 reset goes even further - it replaces absolutely all files on the primary partition. Thus, if the reboot cycle is caused by unreadable files on the primary partition (any files - configuration, data or executable files), the L1 reset should recover the system.

Personally, I would not recommend either of them (L1 or L2) without having a replacement drive ready. If you fail to recover the original disk by the reset, you'd need another disk to get the BCM50 out of the L1/L2 reset state. Note: if you have the multi-image drive, you should convert the drive to the target release before attempting the L1/L2 reset. If I am not mistaken, the multi-image disk (before conversion) will not boot in a system that is in the L1/L2 reset state.
 
To add to UCXGUY useful comments.

There are a number of useful disk imaging tools available. Until recently, I've used Acronis 2009. I didn't get the same results with the later versions.

The main drawback with a *.tib image is that your replacement disk has to be the same sector size or greater. I've had a number of issues with using various disks that appear to be the same sixe. i.e. 10 gigs, 40 gigs etc, but upon reading the disk in a computer, I find that it's less than the stated size, hence the reason for Acronis not wanting to do the restore. Also I couldn't get Acronis to work on the multi image 160 gig drive.

This lead me to dig out a very old program called Winhex. This small program can copy, create *.img images, but the drawback to that is that the images are the same size so you do need to have plenty of disk space on your main computer.

Winhex copies sector by sector and the good thing I've found is that is that most BCM's always have free space at both ends which is handy if Winhex stops when it runs out of sectors in this area, because the new disk is slightly smaller that the original one.

It's worth creating spare disks as it doesn't take that long to do.

Finally on the BCM 200/400 and 450's use the serial port when powering up to watch what is happening. It does sometimes do some self checking as it powers up and it's good to see how long some of the services take to load up.

Even better is to use it when doing a planned shutdown to remove power as you can see stop services until the "killall" command and "system is halted" message appears.

Use 9600 baud for the 200/400 and 115200 (I think?) for the 450's. As far as I know, there is a serial port on the BCM 50 but it's inside the unit and it uses 3 pins, but I also think you might need a special dongle to access it.

I hope this bit of info helps?.


All the best

Firebird Scrambler
Meridian 1 / Succession and BCM / Norstar Programmer in the UK

If it's working, then leave it alone!.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top