SIP Connectivity to ESS server and failback 1

mattKnight · Aug 23, 2017

Hi

I've had a couple of issues where one of our sites (Site B) has had a WAN issue and failed to ESS mode. The gateways are purposefully defined with no recovery rule - and we accept the inherent requirement to intervene to force a recovery from ESS mode.

SIP trunks are supplied locally to Site B via SBC and Session Manager and calls delivered on these trunks are required to only be answered by agents at site B.

When the WAN between site B and core (site A) is down, all appears good, however when the WAN is recovered but before ESS mode is manually recovered, all calls inbound on trunks to site b fail.

What I believe we have here is a conflict between dynamic call routing and manual recovery from ESS. That is when the WAN repairs the Site B session manager routes calls via entity link to Core at site A but site A has no media resource or handsets registered to handle any calls successfully.

At the moment, the entity link is configured using System Manager local host name resolution, with a single name resolving to Core and ESS addresses with the ESS having a larger priority value. Is this the right approach or should we reconfigure to use "hard" entity links targeted at a single IP address and then use a routing policy with the ESS as highest (but usually returning 500 Service down) and Core as a lower one?

Any other suggestions?

ACM 6.3 Session Manager 6.3

Take Care

Matt
I have always wished that my computer would be as easy to use as my telephone.
My wish has come true. I no longer know how to use my telephone.

kyle555 · Aug 23, 2017

You've got a really good handle on it. Mine are ESS has lower number, so it's tried first and go to core CM otherwise.

The choice is 'if one thing goes ESS because it can't speak to core' do you want to keep as much as you can on the core and isolate the 1 finnicky branch or, keep everything as cohesive as possible on the ESS? Your answer probably depends how much AES and call center is involved.

Remember, those primary/total search timers make it so that there's always a theoretical WAN outage that will kick ESS live. If primary search is 3, transition point 1, total search 30, with mgc list core,ess,lsp, then an outage of 3:01-3:59 or 6:01-6:59 etc will always kick ESS live.

If your order was inverted in LHNR, then the ESS kicking in would steal all call processing SM delivers to CM - be it SIP sets for application sequencing or SIP trunks.

Depending how complex your environment is, maybe you can split that hair with port based routing +LHNR. Say, "siteACMentity core:5060,ESS:5060" and the dial patterns for site A come in down that routing policy and site B has entity "siteBCMentity" with links on 6060 to core/ESS but with the ESS as first choice.

mattKnight · Aug 23, 2017

Thanks Kyle

The choice is 'if one thing goes ESS because it can't speak to core' do you want to keep as much as you can on the core and isolate the 1 finnicky branch or, keep everything as cohesive as possible on the ESS? Your answer probably depends how much AES and call center is involved.
[\quote]

All the offices are self-contained business units i.e. Site A don't answer Site B calls and vice versa. So our choice is for if / when Site B becomes isolated, ESS take over for the entirety of site B and remain in that state until manually failed back. Site A would continue on the core. This is a call centre environment and there are AES and adjuncts in play.

the LNHR priority is probably the best way to go, but I need to run through the ramifications in our environment and at least discuss it with the maintainer.

Take Care

Matt
I have always wished that my computer would be as easy to use as my telephone.
My wish has come true. I no longer know how to use my telephone.

kyle555 · Aug 23, 2017

Sorry. There are no good answers.

If they're all self contained, give'em all their own CM.

If you're TLS signaling, CM supports only 16 unique source/dest IP

ORT pairs. So, having 10 sites with this requirement with 10 IPs and port combinations to LHNR to any one first means you get 10 to 1 SM and you're out of capacity to do it between two SMs.

Depending on why your calls fail to answer - intentionally no announcements needed in vectors in site A for Bs calls, or DIDs hit hunt groups directly of which no members of site B are in, maybe you can reroute. I'm just thinking out loud, but say you had port 5060 and 6060. Suppose the intercept/unavailable treatment in Site A was like "go to step 99 if media-gateway 22 <> registered" and have step 99 route to SM but prepended with steering code 6060, or failover.you.com and anything SM gets for failover.you.com goes to the entity link on 6060 which LHNRs through your ESSs first and procr last.

It falls flat with 2 sites in ESS mode unless you scale out your trunking accordingly - which probably isn't pretty. Either way, if you got call center and failover mitigation to deal with, the conditional "if this gateway is registered or not" was made for you!

mattKnight · Aug 29, 2017

Thanks Kyle - Food for thought!

Take Care

Matt
I have always wished that my computer would be as easy to use as my telephone.
My wish has come true. I no longer know how to use my telephone.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

SIP Connectivity to ESS server and failback 1

mattKnight

Programmer

kyle555

Technical User

mattKnight

Programmer

kyle555

Technical User

mattKnight

Programmer

Similar threads

Part and Inventory Search

Sponsor