Question about ESS/LSP mode

heathersuzanne · Feb 25, 2014

Hello,

We have a main hub site (CM5.2), backed up by an ESS in another country. Our remote locations are also outfitted with LSP capability, and hang off the hub site. The failover setup is that all phones register to the main procr, then failover to ESS mode in the event of an issue at the hub, and then if the phones can't get to the ESS either, they fail over to the local LSPs to do their thing.

We have about 4500 phones in about 6 countries.

Let's say hypothetically, the phones fail over to ESS or LSP mode because of some networlk issue at the main site. After the network becomes stable, we initiate a process to being the phones back to the main hub. How long (in minutes) would you expect it to take for the phones to all go back to the hub?

Currently, our phones are taking about 10-15 minutes to go back, and I feel that this is longer than I would have thought. Please let me knwo your experiences...thanks!
Heather

smokinjoe2938 · Feb 25, 2014

what are you recovery rules set at?

acss sme acis sme acss cm 5.2.1 acss cm and cmm acss aura messaging.

Mitch672 · Feb 25, 2014

Normally, the LSP's and the ESS are registered to the MAIN site. 10-15 minutes is probably about right.

You can control the remote sites, by how the "gatekeeper" list is setup in each remote media gateway.
you need to check the MGC list ("show mgc list"), in each media gateway.. would normally be main site PE (Processor Ethernet), or CLANs (CLANs are better if your ESS takes over your main sites G650s, if you have G650's at the main site), then you would list the ESS (again, if you are using CLANs, not needed), then lastly would be the local LSPs IP.
really what triggers the failover/failback is the remote gateway registration.

You can also control the failback of the ESS as well, sometimes they are manual ("get forced takeover port-network x"), depends on how you have it setup.

Mitch

AVAYA Certified Expert

heathersuzanne · Feb 25, 2014

Hi Guys,

Thank you for the feedback! The timer is set to wait an hour before it will try to fail back - and that works perfectly. But once the process starts, I guess I just thought the phones would move back to the main hub site faster. After the hour is up and the network has been stable for an hour, the phones that had re-registered to the ESS or the LSPs start looking for their original 'home base' and it takes longer than expected. But from what you mention, this is normal....

Guess I cant have it all

Heather

smokinjoe2938 · Feb 25, 2014

As Mitch672 stated, if you are sure the wan link is stable then just do a get forced taker over command.

acss sme acis sme acss cm 5.2.1 acss cm and cmm acss aura messaging.

heathersuzanne · Feb 25, 2014

The get forced takeover command works as well, and I dont have a problem with that...but once I type the command it takes 10-15 minutes for the phones to completely be registerd back to the main site. Than's what I was questioning. Thanks for all the feedback!

4merAvaya · Feb 25, 2014

A word of caution (from experience): If your phones are registered to the primary server pair and you have a network issue, and the remote goes into stand-alone mode, the phone may stay registered but all of you remote trunks are in-service/idle at the remote site with no stations connected to the remote gateway.

You might want to consider registering to the LSP/ESS first, then the primary server.

Kevin

wpetilli · Feb 27, 2014

Your mg-recovery rule will dictate when the site will attempt to come back to the main. Once that flip back occurs or even when you do the get force command that should force the phones back right away. I've noticed in the NR there's a field about TCP sockets.. If that is flagged to y my phones took forever to failover, but not failback.

jimbojimbo · Feb 27, 2014

There are a couple of different factors to look at.

First off, the actual phone type makes a difference, if the phones are using CLAN or PE for registration, how the network region is setup, how the IP Options are configured, and what type of gateways.

If using Processor Ethernet, older phones/software (including all releases of IP Agent and IP Softphone) do not support Time-To-Service(TTS). The network region also needs to be setup for TTS to work correctly.

If using CLAN registration the phones will typically be setup to utilize local CLANs. If you have G650, MCC, SCC gateways the gateways must complete the reset and re-initialize the CLAN boards before registration can occur. LSP/H.248 gateway failback "should" be faster.

You note you are on CM5.2 however I will assume you are actually on CM5.2.1 (supported release). The use of the AGL list can improve your recovery times. If you are not using the AGL list you can setup an SNMP string in the 46xxsettings.txt file, reboot the phone, and then run an SNMP walk or get on the phone to pull the AGL list to see how many registration points the phone has. Recovery can be slowed down by having too many entries in the AGL with long timers in the IP-Options. Once AGLs are correctly configured you can status the station and will get a page with the AGL list (very nice for troubleshooting). I typically suggest no more than 3 registration points from each configured network region.

Of course if you are using CLANs for registration there is a limit to the number of simultaneous registration requests. Having all phones at all sites using the same MCIPADD string in the DHCP scope also can slow things down. How you setup the Network Regions has a big impact. Also the number of H.248 gateways registering to each CLAN has an impact. I typically do not suggest more then 8 gateways registered to any CLAN. If everyone is fighting to register to the same CLAN you will definitely see longer delays.

Many customers don't take into account CLAN loading when adding new phones or gateways. You should keep a spreadsheet with the MGC lists from each H.248 gateway. Also run the 'status socket' command on a frequent basis to see how your CLANs are loaded. If you are over 200 active sockets it's time to take a closer look. Especially when you start to determine capacity for component failure such as the loss of a port network.

Hope this gives you some things to look at.

jimbojimbo · Feb 27, 2014

There are a couple of different factors to look at.

First off, the actual phone type makes a difference, if the phones are using CLAN or PE for registration, how the network region is setup, how the IP Options are configured, and what type of gateways.

If using Processor Ethernet, older phones/software (including all releases of IP Agent and IP Softphone) do not support Time-To-Service(TTS). The network region also needs to be setup for TTS to work correctly.

If using CLAN registration the phones will typically be setup to utilize local CLANs. If you have G650, MCC, SCC gateways the gateways must complete the reset and re-initialize the CLAN boards before registration can occur. LSP/H.248 gateway failback "should" be faster.

You note you are on CM5.2 however I will assume you are actually on CM5.2.1 (supported release). The use of the AGL list can improve your recovery times. If you are not using the AGL list you can setup an SNMP string in the 46xxsettings.txt file, reboot the phone, and then run an SNMP walk or get on the phone to pull the AGL list to see how many registration points the phone has. Recovery can be slowed down by having too many entries in the AGL with long timers in the IP-Options. Once AGLs are correctly configured you can status the station and will get a page with the AGL list (very nice for troubleshooting). I typically suggest no more than 3 registration points from each configured network region.

Of course if you are using CLANs for registration there is a limit to the number of simultaneous registration requests. Having all phones at all sites using the same MCIPADD string in the DHCP scope also can slow things down. How you setup the Network Regions has a big impact. Also the number of H.248 gateways registering to each CLAN has an impact. I typically do not suggest more then 8 gateways registered to any CLAN. If everyone is fighting to register to the same CLAN you will definitely see longer delays.

Many customers don't take into account CLAN loading when adding new phones or gateways. You should keep a spreadsheet with the MGC lists from each H.248 gateway. Also run the 'status socket' command on a frequent basis to see how your CLANs are loaded. If you are over 200 active sockets it's time to take a closer look. Especially when you start to determine capacity for component failure such as the loss of a port network.

Hope this gives you some things to look at.

wpetilli · Feb 27, 2014

If you have that many phones homing back to the same CM you should really be using PROCR.. The CLAN's have a ceiling and the top end number isn't recommended.

heathersuzanne · Feb 27, 2014

This is all really great information!!!! Thank you all for your advice! I will be looking into these suggestions and comparing with my system set up tomorrow afternoon, and I'll let you all know what I find. Perhaps there is a way to improve things afterall....
Thanks

Heather

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Question about ESS/LSP mode

heathersuzanne

Technical User

smokinjoe2938

Programmer

Mitch672

Vendor

heathersuzanne

Technical User

smokinjoe2938

Programmer

heathersuzanne

Technical User

4merAvaya

Technical User

wpetilli

Technical User

jimbojimbo

Vendor

jimbojimbo

Vendor

wpetilli

Technical User

heathersuzanne

Technical User

Similar threads

Part and Inventory Search

Sponsor