I'm having an issue with a Mitel on-prem system that just reared it's head and not even my Mitel partner seems to be be able to track down. All services in director are green, and oddly I have no errors in the phone logs. Here's the background and symptoms:
MS Server 2019 Active Directory domain environment. All roles are clean. The domain is completely error free. Domain is contoso.local.
All servers (phone and domain) are virtualized in VMWare 6.7. All virtualized servers are on VM version 15
Phone HQ server is Windows Server 2016.
We've had the Mitel system for about 2 years. Until recently everything was local access only. There was a purchased certificate installed and securing communication between the HQ server and connect and the phones/CAS server named: comm.contoso.com. I have many users that assign phones on the fly because they move around a lot. Along comes covid 19. I already have the purchased licenses for mobility, so we implement the edge, mobility, rast, and turn servers and add a new virtual trunk device. Also, because I'm implementing so many servers, I buy a wildcard certificate. *.contoso.com
Installation goes off without a hitch, I can provision phones over the internet, clients are all secured with the wildcard, and everything is working exactly as anticipated. Fast forward a week from implementation: Friday, everyone assigns their phones without a single issue. Saturday, I wake up and there's a help desk ticket. Users can't assign their phones at any station. I figured something hiccuped. I log in, restart the phone system, and all it's related switches. Everything comes back online, but the issue persists. Phones get "User Assignment - CAS connection failure".
Here's what we've tried so far:
[ol 1]
[li]Mitel partner says the FQDN field on the HQ platform equipment page in director HAS to be the machine name. Prior to this, it was comm.contoso.com. DNS CNAME for comm.contoso.com points to phone.contoso.local (the AD FQDN name of the server). We changed comm.contoso.com to phone.constoso.com, resintalled the wildcard cert, and added another CNAME to DNS to point phone.contoso.com to phone.contoso.local. This exhibited no change to the broken functionality or the working functionality of the system. Users still get "User Assignment CAS connection failure" message.[/li]
[li]Stopping all shoretel services
Backing up the /Shoreline data/Keystore folder
Deleting the /Shoreline Data/Keystore folder
Restarting all shoretel services (recreates the keystore folder)
Restarting all switches.[/li]
[li]MUTE CLEAR# the phones as part of testing.[/li]
[li]I changed the FQDN field back to comm.contoso.com and it breaks the certificate, and lo and behold, my users can assign the phones. Connect works, but insecurely.[/li]
[/ol]
This doesn't seem like a lot, but it's been a number of iterations of this testing. No matter what I try, when the certificate is in place, User Assignment fails. At this point, I'm looking for ideas as to what might cause ONLY User Assignment to fail. It sure seems like a DNS or certificate issue, but I don't understand what would make it happen suddenly like this. Something was obviously cached then expired a week after installation of all the mobility interfaces. Any suggestions to try are greatly appreciated at this point. Is there a hidden self-signed certificate for user assignment being used in the background somewhere that I'm unaware of that might be expired?
MS Server 2019 Active Directory domain environment. All roles are clean. The domain is completely error free. Domain is contoso.local.
All servers (phone and domain) are virtualized in VMWare 6.7. All virtualized servers are on VM version 15
Phone HQ server is Windows Server 2016.
We've had the Mitel system for about 2 years. Until recently everything was local access only. There was a purchased certificate installed and securing communication between the HQ server and connect and the phones/CAS server named: comm.contoso.com. I have many users that assign phones on the fly because they move around a lot. Along comes covid 19. I already have the purchased licenses for mobility, so we implement the edge, mobility, rast, and turn servers and add a new virtual trunk device. Also, because I'm implementing so many servers, I buy a wildcard certificate. *.contoso.com
Installation goes off without a hitch, I can provision phones over the internet, clients are all secured with the wildcard, and everything is working exactly as anticipated. Fast forward a week from implementation: Friday, everyone assigns their phones without a single issue. Saturday, I wake up and there's a help desk ticket. Users can't assign their phones at any station. I figured something hiccuped. I log in, restart the phone system, and all it's related switches. Everything comes back online, but the issue persists. Phones get "User Assignment - CAS connection failure".
Here's what we've tried so far:
[ol 1]
[li]Mitel partner says the FQDN field on the HQ platform equipment page in director HAS to be the machine name. Prior to this, it was comm.contoso.com. DNS CNAME for comm.contoso.com points to phone.contoso.local (the AD FQDN name of the server). We changed comm.contoso.com to phone.constoso.com, resintalled the wildcard cert, and added another CNAME to DNS to point phone.contoso.com to phone.contoso.local. This exhibited no change to the broken functionality or the working functionality of the system. Users still get "User Assignment CAS connection failure" message.[/li]
[li]Stopping all shoretel services
Backing up the /Shoreline data/Keystore folder
Deleting the /Shoreline Data/Keystore folder
Restarting all shoretel services (recreates the keystore folder)
Restarting all switches.[/li]
[li]MUTE CLEAR# the phones as part of testing.[/li]
[li]I changed the FQDN field back to comm.contoso.com and it breaks the certificate, and lo and behold, my users can assign the phones. Connect works, but insecurely.[/li]
[/ol]
This doesn't seem like a lot, but it's been a number of iterations of this testing. No matter what I try, when the certificate is in place, User Assignment fails. At this point, I'm looking for ideas as to what might cause ONLY User Assignment to fail. It sure seems like a DNS or certificate issue, but I don't understand what would make it happen suddenly like this. Something was obviously cached then expired a week after installation of all the mobility interfaces. Any suggestions to try are greatly appreciated at this point. Is there a hidden self-signed certificate for user assignment being used in the background somewhere that I'm unaware of that might be expired?