Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations IamaSherpa on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

BCM 450 KEEPS LOCKING UP & NEEDING A REBOOT? 4

Status
Not open for further replies.

uniquename4me

Technical User
Oct 31, 2013
184
CA
I've been fighting with a BCM450 that has issue with IP sets all off-prem at one location, that one set gets kicked out and then they can't get it re-registered and eventually just register a different extension number? They tell me that other times when they try to register one number it kicks out another and again have to pick a different extension number? Now they're informing me that the main location locks up at times and will not process calls and they have to reboot to resolve the problem??

I've attached a copy of the alarms when this has occured and I'm wondering if anyone has some idea or solution on this?

Here it is:
54005 Mrs:: Shutting down due to MPS communication failure
50064 The Media Path Management sub-system unexpectedly became offiline-terminating
54005 Mrs:: Shutting down due to MPS communication failure
10014 Service Manager- Media Path Server (mps) has stopped unexpectedly. This will affect all IP Telephony. Service Manager is attempting to restart the service.
10229 Service Manager- Doorphone service has been stopped
10212 Servce Manger- Line Monitor Service has been stopped
10211 Service Manager- Computer Telephony Service (Cte) has been stopped
10206 Service Manager- Quality of service Monitor has been stopped
10314 Media Path Server (mps) has been succesfully restarted
10003 Service Manager- IP Terminal Service (UTPS) has stopped unexpectedly. This will affect all IP terminals on the system. Service Manager is attempting to restart the service.
10011 Service Manager- Computer Telephony (Cte) has stopped unexpectedly.
47009 FindMe/FollowMe. Registered Handoff Feature code F960
10212 Service Manager- Line Monitor Service has been stopped either
10211 Service Mangaer-
10303 Service Manager IP Terminal Service (UTPS) has been successfully restarted
10311

Thanks in advance! Let me know if someone has some ideas or direction.

 
I've not seen those kind of event alarms before. Can you do a backup OK?. It might be wise to get it uploaded onto another BCM 450 to see if the issue is database related or corruption on the customers hard disk.

Having read through the query again, It might be wise to untick the "keep alive" DN option as this will prevent users from re registering with the same DN again.

Firebird Scrambler

Nortel & Avaya Meridian 1 / Succession & BCM / Norstar Programmer

Website =
 
Are there other devises on the network that might be causing conflicts?

Marv ccna

 
I've now done a backup of the original programming and changed out the system. I'm waiting to see if the same errors show up although I haven't seen them as of yet.

I'll wait and see.

Also? I looked and am familiar with the setting in previous BCMs but where is the keep alive function setting to be found in the programming?
 
Sets/Capabilities&Pref/IPterminalDetails

________________________________________

Add me to LinkedIN

**New Allworx Forum**

small-logo-sig.png

=----(((((((((()----=
Toronto, CAN
 
Problems are back on the replacement system. Same errors and shutdowns. Twice since Thursday. Ran ok until Sunday night and then again today it crashed again???
 
Get Angry IP Scanner to check for duplicates use arp -a in command prompt from any PC.

It is always a good idea to start IP sets in their own range (Address Ranges in Data Services/DHCP .
i.e. small business I start them at XXX.XXX.XXX.200.

Get a list of those remote sets IP address under IP Terminals.
Make sure all sets are Static or DHCP and not both.

There is an alarm for IP conflicts but maybe its for Lan ports and not sets.


If not too many sets in remote site try changing the IP Address, reboot and see if it took....maybe one is stuck.

Check that there is only one DHCP device on the network and that DHCP on the BCM is set to IP phone only.

Tough one, and I am....Outta Ammo!





________________________________________

Add me to LinkedIN

**New Allworx Forum**

small-logo-sig.png

=----(((((((((()----=
Toronto, CAN
 
I agree with curlycord. I find it best to assign a group of IP addresses to the BCM that won't conflict with the router on the LAN. Usually if you keep the range high (x.x.x.200 - x.x.x.255) as curlycord suggested you shouldn't have any conflicts. Also set each IP phone to either partial or full DHCP if they aren't already. Always best to consult with the onsite IT admin to avoid conflicts with other devices on the LAN.

Brian Cox
Georgia Telephone
 
I've accessed this particular site and one thing that has puzzled me is that the DHCP option is disabled, whereas most BCM systems that deploy IP sets tend to have it enabled for IP sets.

Firebird Scrambler

Nortel & Avaya Meridian 1 / Succession & BCM / Norstar Programmer

Website = linkedin
 
Based on the alarm log that you provided in the initial post, the problem seems to be internal to the BCM. Therefore, checking for IP address conflicts and similar attempts won't likely help.

The first alarm in your post is an error generated by Media Relay Server (mrs) that it cannot communicate with Media Path Server (mps). The forth alarm is from a service monitor process that mps stopped unexpectedly. Media Path Server is a component that is used by other IP telephony components on BCM (mgs, mrs, feps, utps) for media path setup - these components are dependent on mps and must be stopped if mps goes down and restarted when mps recovers. Based on the alarms you posted, I believe the problem is an mps crash (or abort).

The BCM appears to have two problems:
1. mps crashing/aborting and taking down other software components (and a subsequent restart of these components that may not go completely well)
2. IP sets changing extension numbers or getting unregistered (possibly as a consequence of the restart after the mps crash)

The first question is what could cause mps to crash. In diagnostic logs from the BCM, there might be useful information about the cause in mps.log - without the log, I can just speculate. BCM software components have configuration data stored in a configuration database - if some configuration data used by mps is corrupt (contains completely unexpected values), mps could abort or crash due to that. Another possibility is that mps receives an unexpected request from one of the other components and its error handling has a bug that leads to abort/crash.
The second question is why would IP phone related extensions keep getting unregistered/changed/misconfigured. Again, data for IP phones is in the configuration database - if there is some corruption, the behavior could be weird/unexpected.

My guess is that the configuration database is partially corrupt and the symptoms you are seeing are the result of this corruption. The fact that you see the same problems on another system after restoring a backup from the original system (the backup includes the configuration database) is another indication that it might be the case.

In my opinion, you should consider the following steps:
1. The configuration database is stored on the hard drive - the corruption could be due to the drive health deteriorating resulting in read errors. If the drive in the original system is old, you should possibly consider a replacement.
2. Reset the configuration database and recreate the configuration data. To do that, Level 1 reset would be required followed by reprogramming. I know this may not be an appealing option, but I am afraid it might be needed to fix the system.
 
Yes, corruption is highly likely given that a different BCM450 is experiencing the same problems with the same database. Eliminating IP address conflicts as a possible cause of the alarms would seem to be prudent and take relatively little time to troubleshoot. If that checks out, then dive headlong into a level reset and reprogram the system.

An FYI: In the future I'd suggest putting a USB stick on all BCM's and schedule a monthly backup. That way you're sure to have a backup that's uncorrupted when it comes time to replace a failing hard drive. I did this with my own BCM50 and when a lightning strike took it out I was back in service in just a short while.

Brian Cox
Georgia Telephone
 
I've been told that the problem with backups is that the company run taxi's and the calls are via agents. If a backup is done, then they will be logged out.

Personally I would enable DHCP for IP sets and reprogram the IP phones. I tend to use most of the prompts as listed in the link below.

Firebird Scrambler

Nortel & Avaya Meridian 1 / Succession & BCM / Norstar Programmer

Website = linkedin
 
I have come in late on this but I must agree this could be a data base coruption.

I recently replaced a bcm 50 twice due to clipping on a PRI issue.

Replaced all cards.
Replaced the system
Used the same backup on both of them. Same problem.

Reset to default the system and reprogrammed manually and issue went away.

I should have realised this at the start, as the temp system I left in place which I had set it up manually worked.

When the new one arrived I restored the original backup and the issue returned.

Painful for me as there was probably nothing wrong with the original system.

Good luck.

Star for your replies as they were quite useful.
 
FirebirdScrambler: "I've been told that the problem with backups is that the company run taxi's and the calls are via agents. If a backup is done, then they will be logged out."

This information does not contradict my suspicion that the hard drive is going bad - just the opposite. To create the backup file, the system reads and compresses the configuration database. When the system encounters read errors while doing that, it will block a system I/O thread for several seconds attempting to repeat the read operation (it either reads successfully or gives up and a number of attempts). Multiple read errors could cause the system to be "irresponsive" for a period of time, communication with IP phones would be suspended during that time, the watchdog in IP phones would fail and the phones would unregister.

In short, it appears that the hard drive in the original system is starting to fail. Some portions of the configuration database are stored on bad sectors and either cannot be read or can be read with problems. The data in the configuration database is corrupt to some extent. If this hypothesis is correct, the solution would be to replace the hard drive and reprogram the system from scratch (the backup cannot be used because it contains the corrupt data).
 
Hello guys

Thanks for all your replies as they were all helpful. The system uses RAID and therefore another Hard disk is present in case of failure and the configuration was duplicated on another BCM 450 which was the original one fitted. This system was brought back into service and had the same packet losses.

That "test" system was played about last night having loads of changes made to bring in line with how my system is set up. We discovered that all the registered remote worker IP sets had the same IP address against each one and that the discovered public address DIDN'T have the customers internet IP address, but it did have it listed in the Provisioned Public Address box instead when looking in the global options on IP sets.

As you can imagine, this wasn't right and under IP Sub system, the discovered public address was also set with 0.0.0.0 which isn't right. All these things seem to point to it being a configuration issue.

I'm hoping that things have now improved and I'll keep you updated.

Firebird Scrambler

Nortel & Avaya Meridian 1 / Succession & BCM / Norstar Programmer

Website = linkedin
 
It now all appears to have been the way that the customers router was programmed up. It wasn't anything to do with the BCM system.

Firebird Scrambler

Nortel & Avaya Meridian 1 / Succession & BCM / Norstar Programmer

Website = linkedin
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top