Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations gkittelson on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

OSCC Enterprise - Extensions going out of service

Status
Not open for further replies.

LoPath

Technical User
Aug 11, 2009
443
US
Currently running OSCC V8 R2.16.218 on HiPath 4000 V6. All agents are using OpenStage 40 HFA (all running V3 R0.40.0 HFA 171020). Once in a while an extension will go out of service and back in a couple of minutes later. This will happen several times over the course of a few hours. If someone is logged into the phone, it will kick them out. If they're on the phone, the connection will stay up, but OSCC "loses" them. The only fix actions I've seen that work are to either: 1) Rebuild the extension completely in the 4K or 2) Restart OSCC. If anyone knows of a real fix for this, that would be super!

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center, OpenScape Voice
 
Resolved Reported Problems/Symptoms in V3 R0.40.3
NA15766643 Phones go randomly out of service (after DHCP renew or restart) - solution - upgrade phones to V3 R0.40.3
 
Using static IP, but I'll keep my fingers crossed and give it a go! Thank you!

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center, OpenScape Voice
 
It would be worth checking the EVTLOG from the STMI. Download it from the STMI GUI. You might find the connection was dropped from the STMI end rather than the phone. Last released loadware for V6 was A2.004-013.
 
So I'm on the latest gateway loadware and latest phone firmware. Happened again this morning. Checking logs, didn't find anything that really struck me.

From OSCC:
Code:
Time	        Error Type	Primary Error Code	Secondary Error Code	Module Name	Error Text
5/1/2018					
6:37:45 AM	Warning	        1056	                 6250	                ttesrvr.cpp	Extension 8044 (S) is out of service [1]
6:38:07 AM	Information	1072	                 6179	                ttesrvr.cpp	Extension 8044 (S) is back in service [1]


From the gateway card:
Code:
EventLogEntry from TC (tTC_TCP_CLI "05/01/2018 06:37:38.621801" TCConnection.cpp 266):
 EventType: Information
 EventCode: MSG_TC_INTERNAL_EVENT
 EventText: Closing TCP connection 17 in state testing to 172.17.9.144:1705 due Socket closed by client

From the phone's trace file:
Code:
***ERROR:*** Tue May 1 06:37:24 2018.868
PhysicalInterfaceService(973): ./src/PhysicalInterfaceService.cpp:1343 PhysicalInterfaceService::setAllLampStatuses - failed to set lamp status on logical ID 23

***ERROR:*** Tue May 1 06:37:24 2018.868
PhysicalInterfaceService(973): ./src/PhysicalInterfaceService.cpp:1343 PhysicalInterfaceService::setAllLampStatuses - failed to set lamp status on logical ID 24

***ERROR:*** Tue May 1 06:37:25 2018.455
CSTA_Service(785): ./src/CSTAServiceCommsEventsHFA.cpp:2251 Feature Status event from Comms Service, unknown featureName: SecureCall!


***ERROR:*** Tue May 1 06:37:25 2018.971
CSTA_Service(785): ./src/CSTAServiceCommsEventsHFA.cpp:2251 Feature Status event from Comms Service, unknown featureName: AlternateCall!


***ERROR:*** Tue May 1 06:37:36 2018.765
DLSFacade(604): ./src/dls_fsm_interface.cpp:58 DLS_RetryServerWdogCreate(): It was called previously, Wdog canceled.

***ERROR:*** Tue May 1 06:37:36 2018.773
DLSFacade(604): ./src/dls_fsm_interface.cpp:144 DLS_FirmwareExpiryWdogCreate(): It was called previously, Wdog canceled.

***ERROR:*** Tue May 1 06:37:48 2018.797
ServiceDataManager(1095): ./src/ServiceDataManager.cpp:590 initialiseCacheItems hostname_config_group failed validation ""

***ERROR:*** Tue May 1 06:37:49 2018.857
DnsMgr(1095): ./src/DnsMgr.cpp:504 setDnsServer: setting dns-server-addr "192.168.23.100" failed, new value invalid

***ERROR:*** Tue May 1 06:37:49 2018.859
DnsMgr(1095): ./src/DnsMgr.cpp:504 setDnsServer: setting dns-server-addr2 "192.168.15.100" failed, new value invalid

***ERROR:*** Tue May 1 06:37:49 2018.981
MediaControlService(556): ./src/MediaControlHelper.cpp:199 VideoService not set!

***ERROR:*** Tue May 1 06:37:59 2018.041
PhysicalInterfaceService(896): ./src/PhysicalInterfaceService.cpp:1343 PhysicalInterfaceService::setAllLampStatuses - failed to set lamp status on logical ID 23

***ERROR:*** Tue May 1 06:37:59 2018.041
PhysicalInterfaceService(896): ./src/PhysicalInterfaceService.cpp:1343 PhysicalInterfaceService::setAllLampStatuses - failed to set lamp status on logical ID 24

***ERROR:*** Tue May 1 06:38:01 2018.878
ServiceDataManager(9479): ./src/ServiceDataManager.cpp:1294 manageableDataChanged unknown item "slk-modules-all"

***ERROR:*** Tue May 1 06:38:03 2018.356
CSTA_Service(785): ./src/CSTAServiceCommsEventsHFA.cpp:2251 Feature Status event from Comms Service, unknown featureName: SecureCall!


***ERROR:*** Tue May 1 06:38:03 2018.558
ServiceDataManager(555): ./src/ServiceDataManager.cpp:1294 manageableDataChanged unknown item "slk-modules-all"

***ERROR:*** Tue May 1 06:38:04 2018.300
Admin Phonelet(865): ./src/AdminPhonelet.cpp:1682 Failed to subscribe

***ERROR:*** Tue May 1 06:38:04 2018.648
CSTA_Service(785): ./src/CSTAServiceCommsEventsHFA.cpp:2251 Feature Status event from Comms Service, unknown featureName: AlternateCall!


***ERROR:*** Tue May 1 06:39:06 2018.848
DLSFacade(886): ./src/dls_fsm.cpp:2718 DLS_fault_recovery() prev.State = 3 last Event = 5

***ERROR:*** Tue May 1 06:47:37 2018.986
DLSFacade(886): ./src/dls_fsm.cpp:1005 dropping unexpected event TIMER_EVT_DLS_CHECK in state IDLE

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center V8, OpenScape Voice V9
 
Resolved Reported Problems/Symptoms in V3 R0.40.3

Did this get pulled? I'l only seeing V3 R0.40.0

Don Bruechert, Voice Comm Analyst II
CareTech Solutions @ Holy Family Memorial
Manitowoc, WI, USA
 
Not sure... I looked for it as well, but only found it on the Tonido site. :)

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center V8, OpenScape Voice V9
 
The release notes for .1 .2 and .3 are in the portal (that I have access to) but only .0 for the firmware. I'll download it off Tonido as well and look here before I install it anywhere. I have been having the same issue for years with phones randomly going "Telephony Down" (not actually rebooting) for about 50 seconds. When I look in HISTA is says "Red due to loss of layer 1 signal" and then eventually says it is back in service about 50 seconds later. If that happens to an agent phone it does bump them off OSCC.


Don Bruechert, Voice Comm Analyst II
CareTech Solutions @ Holy Family Memorial
Manitowoc, WI, USA
 
Yep, same here. It's frustrating. My only saving grace is that I plan to migrate these phones to OSV, hopefully by the end of this year. I did open up a ticket with our support folks, but I'm anticipating I'll be told "it's a network problem".

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center V8, OpenScape Voice V9
 
Does look more like a phone issue than STMI, STMI just sees the connection close.

Difficult to progress on V6. The latest HFA loadware you are using will not be tested on V6 software. Always worth trying the latest but if it doesn't work, you could try going the other way and see if older firmware helps.
 
Like Don, I've seen this issue for years. The fix action has always been to rebuild the extension in the 4K. That's not an acceptable fix to me, since something is obviously broken. [bugeyed]

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center V8, OpenScape Voice V9
 
I feel your pain, but Unify would fix it (assuming this is a phone bug, given the opportunity to do so, if the phone was a supported phone running on supported software. As the 4K is on V6, it's been left a little bit late. I don't know anyone who would accept a delete and readd as a solution, that's only ever a workaround but the time to push it was when V6 was not M44. Maybe it is a network issue. Is the phone on the same network as the STMI or is it routed? Are you using QoS? etc etc. Could take monitor port traces from phone end and STMI end to rule out network. Lots to look at but if you're moving to OSV probably not worth it. You could also try newer STMI loadware, the V7 loadware will load the card fine and if it's HG3530 only I would not expect a problem. If you have the same STMI driving IPDA also then you need to be careful with the versions and conside the NCUI loadware as well. If this problem happens regularly it would be worth a quick test on newer loadware which you know has been tested with your newer HFA loadware.
 
In my case I'm on V7 R2.23.7 I think - loadware 7. I'm home now so I can't look. In my case I have (am down to) 5 buildings. The mother ship is the hub and is at the hospital, 4 locally remote sites with AP shelves and APEs, one site has 2 shelves. All of the phones at each site are on the same VLAN as the hardware in their building, but each building's normal operandi is the link to the mother ship. Each site has it's own external connections in case of link loss. I have the phones at V3 R0.23.3 right now. I recently bought Path Solutions TotalView and I am aware there are network issues, and TotalView tells me what I need to do to fix them, but the kiddies "know more" than I do and the boss is currently siding with them for the moment. There was a problem in earlier versions of the software, such as the one DERT is using right now - I will use that to recover a phone but I can't leave it there because that is either the version where phones randomly brick, or it's the one where 245 days after the last time the phone boots it shuts off the LAN2 port and needs to be rebooted to last another 245 days. V3 R0.23.3 has been stable for me other than the phones "wink" randomly, regardless of building, and I can't find a common denominator. When the phone is doing it's thing the phone reports a red alarm from LOS I believe it's level 1 - If I remember tomorrow I'll post a log snip. The "normal" people don't know because they aren't usually looking at their phones, but the call center people certainly know! Some of my "team" have been more petty lately and they are opening tickets for every phone that makes a computer lose its network connection now and I have no way to close them reliably. I was going to try pushing out this new software to a few of the problem ones just so I have an excuse to close the tickets - those phones might not see another incident for 6+ months so there is no way to evaluate the version. Tonido was offline so I wasn't able to get it. I laid out over $30K to take the entire North American Service track for the 4K and the a$$holes still won't give me access to SWS as a self maintainer. I can usually get firmware with the portal access I have.

On the subject of the delete and rebuild I have a suggestion for you. I have problems with my OS40T phones, also randomly. I have the company logo on the left side of the display. An instant indication of the problem is that the logo will be missing off the screen. people can call the phone and you can answer it, but they cannot put it on hold, transfer or use any other function keys. I notified Black Box of this issue and also MBJ in TAC and also explained how I fix it (temporarily). They refused to look at the problem because I was not on V7R2 - so I went to V7R2 last year and I still have the problem. I was out of state for a couple weeks over Christmas and the main phone in ER went out - they called Black Box because we have a remote assist contract and they told them the phone had to be deleted and put back in and that would be left for me to do when I got back - no one bothered to call me. So here is my suggestion for how you might temporarily fix your problem without wasting all that time:

Bring up the phone in Assistant or however you want to do it. DO NOT change the board (unless the board is full). Change the PORT address of the PEN to a different PORT on the same board and save it. Give the system a minute to digest that and do it's housekeeping, and then Re-Search for it to bring it up clean and change the PORT back to what it was and save it. In the case of an OS40T the phone will be right back to normal and it might be another 6 months before it flakes out again. I don't know if that will make a difference for an HFA phone, but the way I figure it there muse be some kind of corruption on the addressing table and then I change the port and then change it back it re-writes that data. Takes only a few minutes to try - especially if you have a side car and a whole bunch of programming to redo if you delete and recreate!


Don Bruechert, Voice Comm Analyst II
CareTech Solutions @ Holy Family Memorial
Manitowoc, WI, USA
 
I wanted to follow up with the good folks on this board. I talked with our network engineer and I think QoS may be to blame here. Not sure if blame is the right word, because our network is obviously congested, so maybe QoS will HELP us here. :)

Our phones were originally set up like this:
Layer 3 signaling: 26 – AF31
Layer 3 Voice: 46 – EF
Layer 2 signaling: 3
Layer 2 Voice: 5
Layer 2 Default: 0

Our Network Engineer suggested changing the layer 3 signaling from AF31 to CS3. He stumbled upon a blog post with this tidbit of info:
Cisco used to recommend setting voice signaling to AF 31 (DSCP 26). They thought that tagging voice control this way was appropriate, because it allowed for the same class 3 as the 802.1p value, while still setting a drop precedence (1), which is what the strength of DSCP is about. But they were criticized because people were saying that systems using IP precedence instead of DSCP could not really use the drop precedence (value 1) and just the IP precedence PHB part (3), so using AF31 was not the best choice ever.

After years of hearing that complaint, Cisco released new good practices and new tables showing that in fact voice signaling should slowly be changed to DSCP 24, which is CS3 (so there is no drop precedence and a complete match between the DSCP value, the IP precedence value and the 802.1p priority level). That was a couple of years ago. But of course in between the same complaining customers had already implemented the AF31 tag, and started complaining again, this time about migration issues, stating that if the systems were fully DSCP it didn't really matter, etc. So today, the new recommendation is DSCP 24 for voice control, DSCP 26 being still largely accepted for migrated implementations.
With the help of the DLS server, I was able to change all 157 phones overnight. Just changing the phones made a noticeable impact, but the problem didn't completely go away. Now I need to change the gateway cards to match. Changing the gateway cards requires restarting them, so yours truly will be doing that tomorrow at 4AM.

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center V8, OpenScape Voice V9
 
Huh..... That's interesting, and I have DLS so I can try that, and I can roll it out on a floor where no one is at night so I can flip the board without too many people knowing.

I do believe my VLANs in the switches are set up for those same protocols though. Do those have to be changed as well? If so I might only have a snowball's chance in He11 of getting that done.....

How do you describe a "noticeable impact"?? Is it quantifiable in the user experience?

Thanks for sharing this!

Don Bruechert, Voice Comm Analyst II
CareTech Solutions @ Holy Family Memorial
Manitowoc, WI, USA
 
Our network guys said the switches were already configured to allow that QoS. So if that's your case, you should be golden. I'm no Cisco expert, but each port looks to have "auto qos voip trust" in the config.

Noticeable - before we were having in the ballpark of 10 extensions dropping every day. After changing the phones, it dropped to 1 or 2. I haven't had one drop yet since I changed the gateway cards about 4 hours ago, so I'm keeping my fingers crossed.

Also for a point of reference, restarting the card only took the phones down for about 2 minutes.

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center V8, OpenScape Voice V9
 
If I may ask, how did you set DLS to set your Layer 3 voice signalling to CS3? I only have an option for CS0 and then AF11 thru AF43, which I think represents DSCP 0, and then 10 through 46 for the AFxx options.

Don Bruechert, Voice Comm Analyst II
CareTech Solutions @ Holy Family Memorial
Manitowoc, WI, USA
 
It's in the drop-down for me listed as 24-CS3. So 24 should do it. So we're square, my phone settings are now:
Layer 3 Signaling: 24-CS3
Layer 3 Voice: 46-EF
Layer 2 Signaling: 3
Layer 2 Voice: 5
Layer 2 Default: 0

I changed the gateway cards in expert.

LoPath
Maintain HiPath 4000 V5 & V6, OpenScape Xpert V4, OpenScape Xpressions, OpenScape Contact Center V8, OpenScape Voice V9
 
You didn't set the card to 24 as well did you? Don't forget the STMIs want an 8 bit number.
 
I don't have 24 as an option in my dropdown list. What version of DLS do you have?

Don Bruechert, Voice Comm Analyst II
CareTech Solutions @ Holy Family Memorial
Manitowoc, WI, USA
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top