Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

32 second disconnect on SIP trunks for only one member of the Coverage Answer Group 2

Status
Not open for further replies.

jl3073

Vendor
Nov 19, 2007
7
0
0
US
Sorry guys, but reaching out in desperation. I'm the AE (not a tech) with an unhappy customer.

TL;DR version. one extension (only) in a CAG disconnects after 32 seconds of a SIP call. Other four work normally.

Details.
Environment is CM R6.1, Session Manager, System Manager, Portwell ASBC. AT&T SIP Trunks, approx 100.
DID directed to a Virtual extension. This Virtual Extension goes to Coverage Answer Group 9.
Group 9 has only five (5) members. All with J179's, FW converted to H.323 and Aliased as 9611G.
Four of these members have no problem at all in answering calls.
If the fifth member (3288)answers, is disconnected after 32 seconds. (Note: 3288 was originally the only person on that team and no problems. Only after building and deploying the Coverage answer Group did this crop up)
Wireshark in front of SBC says we get ACK from AT&T on calls for the four extensions with no issues.
Wireshark report does NOT see ACK from AT&T if extension 3288 answers. So call is disconnected at 32 seconds. AT&T insists they are sending it though.
We added a sixth member for grins. Took 3288 out. works fine.
Put 3288 back in again and the sixth member disconnects at 32 seconds as well. but other four remain working.
Moved 3288 to #1 position in the group. Still disconnects.
Internal calls to this virtual extension test ok. no disconnects, as you might expect.
DID calls to 3288 work fine. no disconnect.
NOC technician has done a stare and compare on all phones. See's no differences between phones. And, we've groomed the naming convention for consistency. Didn't help.
And AT&T appears to have no interest in a joint troubleshooting session. We've shared the Wireshark results and they don't want to even see it.


if any ideas on where to look next, I'll pass it along to my team.

Thanks!
 
Use dadmin login in CM to get the UID of the extension:

Command: display internal-data ext-map 3288

In this order, remove and add station 3288 to change internal UID, then test again:
remove station 3288
add station xxxx
add station 3288


A great teacher, does not provide answers, but methods to teach others "How and where to find the answers"

bsh

45 years Bell, AT&T, Lucent, Avaya
Tier 3 for 35 years and counting
[URL unfurl="true"]http://bshtele.com[/url]
 
AvayaTier3. thanks! I had floated that yesterday and just checked with customer and my tech. They tried it this morning, but no joy.
I did find that in playing around with it, they changed the member of Position 3 to the customer telecom guy, taking the original P3 out. and left position four BLANK. So Members 1, 2, 3 skip 4, 5 & 6. for some reason, all is now working. Odd. So, I told him to take himself out of the group and put original member back in Postion 3, but still leave position four blank (just like the working scenario). no logical reason, but it's one change at a time to see what works.
Tackling that in the morning.
 
It might be a dumb question but is 3288 on a different subnet/different IP than the rest of the phones? I'm wondering if its using a different subnet that it isn't in the system manager location or ip-network-map.
 
If you see the 200OK of a bad call going to At&T and that looks (somehow) identical to the 200OK of a good call I wonder where the ack of the bad call is going.

I’d ask At&T for the pcap of a bad call and look for the 200 OK. There is a change they didn’t even look at it. And if for some reason only for this extension the 200 OK has extra headers it might be the MTU is to big and they never got the 200 OK. Which would explain why there is no ACK. And without the ACK, CM will indeed disconnect.

Can you also ask the customer for a pcap of the their firewall and enable debugging on the SBCE? You need to make sure the ACK (never) arrives at the SBCE and if it does, see if this ACK is perhaps dropped.

Some other thoughts

Did you try building a new CAG instead of CAG9?
Did you compare private and public numbering on CM and perhaps adaptations on SM?

Freelance Certified Avaya Aura Engineer

 
separate idea to do the same thing a different way

Use a terminating extension group - it's an extension with up to 6 other extensions as the termination.

 
G van Hamburg.
If you see the 200OK of a bad call going to At&T and that looks (somehow) identical to the 200OK of a good call I wonder where the ack of the bad call is going.

I’d ask At&T for the pcap of a bad call and look for the 200 OK. There is a change they didn’t even look at it. JIM: Actually, on Friday, we scheduled a call for 11:00 AM today to do just that.

And if for some reason only for this extension the 200 OK has extra headers it might be the MTU is to big and they never got the 200 OK. Which would explain why there is no ACK. And without the ACK, CM will indeed disconnect.
JIM; My Tier 4 engineer got involved Friday. He hinted at the same thing, but feels it's AT&T's job to tell us. One thing we found in testing. It's rejected ONLY if the call comes in on line 1 of the phone. if LIne 1 is tied up and rolls to line 2, call stays connected. Did a stare and compare, and only difference is in the Contact line of the Message Header that starts with the user's name.
Line 1 calls that fail has this appended to the string. ;+avaya-cm-keep-mpro=no.
Good calls do not have that string at the end. My engineer says this would not create this issue, but I suspect it's not the feature being turned on that is the issue, it's the length of this line in the header. TBD.

Can you also ask the customer for a pcap of the their firewall and enable debugging on the SBCE? You need to make sure the ACK (never) arrives at the SBCE and if it does, see if this ACK is perhaps dropped.
JIM: No firewall. We've checked the Portwell. all is good. But the pcap from the AT7T router should show us where the issue is. We absolutely are sending 200 OK to AT&T. So I'm suspecting something like above, the message header being too big, especially since the issue is only tied to one extension and his name shows up in this header.

Some other thoughts

Did you try building a new CAG instead of CAG9? JIM: we did. did not help.

Did you compare private and public numbering on CM and perhaps adaptations on SM? JIM: have not, but my Tier 4 guy is convinced it's not a issue with CM.

JIM Will update the string after the call today @ 11
 
KYLE555. I thought a TEG only handled one call at a time? I have floated the idea of just building a Hunt Group. We've got Elite licenses. But if the issue is the header for 3288, not sure if experience would be any different. Will know more after call at 11:00
 
This could be an issue: 'Line 1 calls that fail has this appended to the string. ;+avaya-cm-keep-mpro=no'

The contact header format is probably something like this:
Contact: "SIP User" <sip:+xxxxxxxxxxxxxxx@10.8.14.10:5060;transport=tcp;maddr=xxx.xxx.xxx.xxx>;+avaya-cm-keep-mpro=no

Some Service Providers think this is a compliancy issue with RFC3840 because the Contact header has no quote marks causing the call to fail.
Long story short, there is no 3840 compliance issue here. So if the far end cannot parse this request it is a problem on far end, as this is allowed according to RFC 3261.

To avoid the discussion, I suggest using a sigma script to modify the contact header, take names out of headers (From, To, Contact) and remove all unneeded headers too.

Freelance Certified Avaya Aura Engineer

 
Update, such as it is.
Yesterday:
AT&T Tier 2 has acknowledged he wasn't seeing ACK.
AT&T Tier 3 says "Ribbon is seeing the ACK being passed back to the server. So we need more logs to prove and or disprove" Assumption this is between AT&T servers and Edgemark on-site router, since Tier 2 has said he doesn't see ACK

Today: in testing today, they realized that there is a scenario where it stays up past 32 seconds. IT used his own cell phone to launch test calls. He has EC500 turned on. so when he launches that test call to the CAG from his cell phone, CM sees the caller as INTERNAL, and the call stays up. Calling from normal PSTN/Cellular calls it's business as usual and call drops after 32 seconds.

will update once Tier 3 finishes their testing.
 
Hmm, seems like it! And to me, it looks as if a lot of unneeded headers pass to the SIP provider too. You did say you have an Avaya SBCE, right? Any SIgma script on the Server to the the SIP provider? Can you show me that script?

Freelance Certified Avaya Aura Engineer

 
and, updated.
Customer just received this from AT&T. Good to know.
I just received notification from the vendor (Ribbon) on your open issue. Per the vendor, “The official word from engineering is that the call exceeds the limit on max number of Ringing messages with different To tags. The limit is 5 in 15.8.x code. The limit was increased substantially in 16.x versions. The workaround would be what the customer already discovered, limit answer pools to 5 extensions.”

Followed up. 15.8.x is supported. 16.x is not GA yet.

Trying to decide if pressuring Ribbon to provide a "patch", or investigate a different method of Call distribution that won't violate the limit of Five.

I still think something else is going on, since the issue seems to be tied to ext 3288, but TBD. Good news is AT&T and Ribbon have acknowledged something in their network.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top