Temporary Failure on the Phone

RobertoGritar · Oct 1, 2013

I've been having an issue with some dropping calls, this issue happen randomly during the day. I think is a network problem because when I play the real time monitoring tool I can see in the real time data/termination cause code "TEMPORARY FAILURE" I understand that this is a disconnection between IP phone and CUCM. I've been checking the netwrok, there is no High CPU, there is no outputs queue drops neither CRC or interface resets. I have no ports blocked into my firewall and no restriccion to trafic voice.

Any idea of how I can attack this issue?

trvlr1 · Oct 2, 2013

Is this across many/all phones or just a few/one?
When you drop the call is the phone resetting?
If not, it may be the lines you are using that are having the issue (T-1/PRI/POTS etc)If this is the case, have the Telco check them out and monitor them over a period of time. If you do continue to have dropped calls, make note of time and number called and calling, to help them narrow down any issues.
If the phone is resetting, network techs should be able to monitor the state of the data circuit(s) as well as the internal network.
The other question is, are these dropped calls all to one location? If so, the issue may be on their end.
Hope this helps!

MitelInMyBlood · Oct 4, 2013

Temporary failure (also sometimes seen on the display as "UCM Down" is a loss of the path between the instrument and the call manager. Ordinarily this would not be call-affecting unless the path back to the CUCM shares a path common between the parties on the call.

You are likely correct in assuming this is a network issue, although it manifests itself on the phones rather than on the computers. (Computers are very forgiving to momentary data loss. By comparison, VOIP phones are very UN-forgiving.

We ran into a similar problem a couple years back that nearly drove us nuts and had end-users thinking the new phone system was a POS. Ours turned out to be caused by an over-aggressive setting of the ARP timers, re-ARP'ing the network at 5 minute intervals because someone thought it was a good idea to do that. That person in no longer in our employ...

Original MUG/NAMU Charter Member

RobertoGritar · Oct 7, 2013

Hi!

Of course the phone is reseting when the call is dropping and sometimes when there's no call, and there are a lot of phones in a differents distribution switches with the same issue. We reviewed cabling with fluke and there's no problem, we checked with Power Cube and is not a PoE issue. We changed the speed to 100 full and seemed to be fixed, but true is not. We checked the ARP timers and are in a default value (5 min).

Any other idea is completly welcome.

Regards!

whykap · Oct 7, 2013

SO you are running power via cisco power supplies and not switch POE?
I've seen time and time again users bumping them and resetting the phones. You mentioned a lot of phones are having this issue. Is it the same phones again and again? The only reason I ask is because you mentioned power cubes.

RobertoGritar · Oct 8, 2013

Hi whykap!

I mentioned power cubes because I used them to isolate the issue, thinking it was a PoE problem. But telephones still losing conectivity with CUCM.

Regards!

trvlr1 · Oct 8, 2013

Do you have QoS set up on the whole network, including any connecting circuits?

RobertoGritar · Oct 8, 2013

Yes,indeed! By the way now Im reading about keepalive function, if you know something about please let me know.

Regards!

gnrslash4life · Oct 10, 2013

Ive seen a bug in CAT OS that dealt with port security that did what youre talking about.

RobertoGritar · Oct 10, 2013

Hi gnrslash4life!

Do you have information about this bug? After all I have done, what you supposed makes sense.

MitelInMyBlood · Oct 10, 2013

Cisco Default ARP timeout is 240 minutes. If you are re-arping the network at 5 minute intervals, this can cause (and has caused) the exact issue you are experiencing.

We had this exact problem 2½ years ago and it doggon near drove us nuts. Problems began about a week or so after upgrading from 8.0 to 8.2. The problem only seemed to manifestitself during the "busy hours" of the day, i.e., starting around 9AM until around 10:30 then again in the afternoon, starting around 1:30 PM until around 3 PM.

It was not affecting the entire network, but only certain floors of the (9-story, 550,000 s.f.) building. Our management was on the verge of demanding that we roll-back to the prior software load, even though by then we had been fighting the problem for a full and continuous 3 weeks.

Management, not understanding how VOIP phones work, swore it had to be a phone issue. Network professionals and Cisco TAC & our local CCIE/Voice swore it was a network problem.

We had 6 laptops set up in various wiring closets capturing packet traces. We could actually see packets drop when the problem occurred.

During those periods when the problem was manifesting itself, users were also complaining of long bursts of choppy audio and/or lots of distortion. Frequently if the caller would stay on the line the call quality would sometimes clear up.

On internal station-to-station calls, one onstrument would display "Fail" while the other instrument would display "UCM Down, Features disabled".

Overall it was literally a huge cluster-f*** of people pointing fingers and the higher-ups demanding immediate resolution.

To insulate our senior executives from the problem we temporarily connected a dedicated fiber between their wiring closet and the local segment where the CUCM was connected. That put their fire out, but didn't solve it for the rest of the building, although doing this did provide a lot of credence to the original diagnosis, that it was a network problem and not a phone system problem.

The issue was finally escalated to P1 and Cisco BACKBONE engineers were engaged. Both Cisco and the local VAR had people on site every day and late into the evening trying to identify the root cause. Finally TAC noticed that all of our ARP timers were set for 5 minutes. The recommended setting them back to default values (240 minutes) and suddenly "the problem" was gone.

As an epilog to this, we do not believe that the ARP timer setting was the actual root cause, but more likely the "trigger" mechanism. At the time our CORE consisted of a pair of 6500's running ancient IOS that had not been rebooted in years (literally) because "the powers that be" would never allow a maintenance window. Today we not have a pair of 7K's in the CORE, but ARP timers still at default 240 minute settings. For what it's worth, we have not had a recurrence of this problem in more than 2 years.

If your ARP timers are set for 5 minutes, I would recommend starting there.

HTH

Original MUG/NAMU Charter Member

gnrslash4life · Oct 15, 2013

RobertoGritar - I dont recall what exact version it was that the bug was in. However, its fixed by turning off port security on the ports affected.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Temporary Failure on the Phone

RobertoGritar

Technical User

trvlr1

Technical User

MitelInMyBlood

Technical User

RobertoGritar

Technical User

whykap

Vendor

RobertoGritar

Technical User

trvlr1

Technical User

RobertoGritar

Technical User

gnrslash4life

MIS

RobertoGritar

Technical User

MitelInMyBlood

Technical User

gnrslash4life

MIS

Similar threads

Part and Inventory Search

Sponsor