Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Mike Lewis on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

System not running system SD primary software after non-mergeable SIP Trunk changes 5

Status
Not open for further replies.

Notabot

IS-IT--Management
Sep 13, 2024
4
Folks, really need your help here.

I’m facing a major and unbelievable problem where non-mergeable SIP Trunk changes corrupts the SD card and cause it to throw “System not running system SD primary software”. I’ve done a lot of testing hoping there was a chance of resolving on my own, but also wanted to gather as much info before asking for help. This error has been reproduced on 3x different systems (with one live for 5 years and no issues), see ‘story’ and table below.

The basic gist is it’s happening on V2 and V2A cabinets on different R11 release (11.0.4, 11.1.2 & 11.1.3). Recreating the SD card clears the error, until another non-mergeable change is made. Different computers (Windows 10 and 11) with different SD card readers were tried with the various Manager versions used in recreating the card. One system’s SD card that was running for 5 years was not removed from the cabinet

Control Unit NameControl Unit VersionSoftware Ver Originally InstalledSoftware Ver ChangedSD CardNotes
CU-AIP 500 V2A PCS-0211.1.3.2.0_Build_611.1.3.1.0 build 34Micron 8GB October 2016 -PCS18New system, locally configured
CU-BIP 500 V211.0.4.7.0 Build 611.1.3.1.0 build 34Micron 8GB July 2018Live system running since 2019, remote city
CU-CIP 500 V2A PCS-0411.1.3.0.0 Build.2311.1.2.4.0 Build 18SanDisk 8GB July 2022 - PCS22New system, locally configured

FULL STORY

The first system (call it CU-A) which I’m trying to ready for a new branch is an IP500 V2A PC02 at R11.1.3.2 with the blasted high pitch whine issue . It was running fine for a few weeks until I started to make SIP trunk changes which weren’t registering. After a non-mergeable change, the unit rebooted for 15 minutes and returned the dreaded ‘System not running system SD primary’.

Thinking it was the control unit or SD card, I tried registering the SIP trunks on a remote live system that was running off of the same ISP that was providing the SIP trunks since their ITSP domain IP wasn’t resolving unless it was on their network. This remote system (call it CU-B) is an IP500 V2 and has been operating live for 5 years on R11.0.4.7 . The SIP trunks registered with inbound calls (to an AA) and outbound (forwarding call from analog trunk to SIP trunk to cell) tested successfully. However I couldn’t get external call forwarding to work purely off of the SIP trunks so I made additional changes to troubleshoot which eventually caused the same SD card error. I couldn’t believe it, especially on a live system and my @$$ was on the line. Fortunately, the system was still running (my guess it was off the NVRAM config) and no one at the branch was the wiser the next day. I managed to ship an SD card reader same day and got someone to pop out the card, install R11.1.3.1.0 on a local machine at the branch and recreate the SD card. Knock on wood, they been running for over a week now (yes, that’s how long I’ve been banging my head with this).

CU-B changes for the remote site were done thru another PC (the one used to manage the IPO for all our branches). I had updated the manager application to 11.1.3.3 (July 2024 release) and was the same release I was managing CU-A. So my first thought was there was major bug with the latest release. The other consideration was maybe SD cards were old (2016 and 2018) and maybe co-incidentally they both failed.

So I got a third V2A control unit (call it CU-C) at PCS04 (no whine) with a brand new SD card PCS 22 (watermarked July 2022). After making SIP trunk configuration non-mergeable reboot changes, same thing happened ‘System not running system SD primary software’. NOTE, this unit was not connected to any network, except the management PC at 192.168.43.1 via the WAN port. Just in case there was some malicious traffic coming from the provider’s SBC IP. Also, I take care to change all default passwords on user extension accounts (RemoteManager), disable Administrator, EnchTcpaService (until needed), etc and restrict IP Routes to only the provider’s SBC and DNS.

This is getting really long, so gonna go point form…

Computers with clean OS re-install, AV and firewall disabled during SD card recreate and manager changes, smartscreen off (Avaya don’t like signing their software), no bitlocker…

DTE port readouts show the below just before it goes into long 15 minute reboot cycle and then shows the SD card error:
Creating SIP Trunk SIP Line (17) with proxy: ***BLANKED OUT***
SNMP::SNMP Server is disabled
.WATCHDOG TIMEOUT wdogmax 249990000l StartUp stack f07f35e8 f07f3700 f07f48a0 f02ab7b4 f02ac8bc f005c3bc f009d1d0 f001482c f0097d70 f0280cdc f0280c90 IP 500 V2A 11.1.2.4.0 build 18

Other notable readouts are:
System Primary is INVALID

However the below shows after first reboot of the system, but functioning normally
SMXFS: Check bad cluster replacement 0x3B280
SMXFS: Already marked as a bad cluster 0x3B280

ASBC. Second Watermark is Invalid
Valid Second Watermark

CHKDSK shows no bad sectors just after format/recreate from manager but then will show “32 KB in bad sectors” after inserting in control unit and changing from BASIC to STANDARD mode.


I’ve got more and will post as requested.

I searched for this issue as best as I could and would have thought I’d come across this given this has affected 3x control units, one running fine for 5 years...

No change thru manager should ever corrupt the SD card. This is insane.
 
Fairly certain I've isolated the problem and look's like it is the RTP port range change to 1024-65534 that busts it up.

From a default configuration with only the license file loaded, no SIP trunk lines added yet, just changing the RTP port range has caused the SD card error.

To ensure it's not caused by other non-mergeable reboot changes, I changed the DHCP mode from client to disabled to client to disabled (no RTP change) to force a reboot 4 times (broadcast IP in System Tab fields still set). No issues.

Can someone verify for me and test opening the RTP port range to 1024-65534 on a 11.1 system (better be a test system, otherwise in a world of hurt)?

Can't believe this hasn't been reported yet and occurs on all 11.0 and 11.1 systems..



1726520300059.png


1726520300076.png

1726520300091.png
 
I have no system available to test. But I never changed RTP range to such a wide range. Be aware that you need two ports for each parallel call. I guess not that you want to handle 30000 calls at once.
 
As @derfloh points out, you're probably the first person to set the max RTP so high. Dangerously close to 65536, so its probably some binary memory overflow glitch that you've found.

The Manager/Web Manager docs do say 65530 is the maximum supported, so the error here is that Manager/Web Manager should not allow you to set a higher value.

Yes, Avaya should have caught that, but meanwhile the workaround is to stop doing it (stop hitting yourself in the face).
 
1726569063138.png
Have you tried witrh the default ports range? I don't think your provider uses such big range..
 
@Madil
You only define what port range you use in your IPO. The provider tells you what port he wants you to send your RTP packages to. It can be that they use such a big port range. But you don't have to know it really (except if you wnat to nail it down in the firewall).
 
Thank you for your replies.

The RTP change as the cause was not apparent based on the “System not running system SD primary software” error.

The first assumptions were bad SD card, control unit or the Windows machine used to re-create the SD card.

I'm aware the RTP range is ridiculously large, but this was at the direction of the SIP provider (a major Canadian carrier), per their setup documentation. They keep on referring to their setup doc, instead of providing meaningful assistance when running into any issues.

Yes, the obvious workaround is "to stop doing it ".

I will advise the SIP provider that the community feedback confirms such a large range is inappropriate.

As requested, can someone please verify this bug for me by opening LAN1 RTP port range to 1024-65334?
 
Last edited:
Again… they use that wide RTP range. What you can configure is the range that IPO uses. Their information is just for you to ensure that you have your firewall towards the needed provider ports open.
 
"As requested, can someone please verify this bug for me by opening LAN1 RTP port range to 1024-65334?" - NO. You've clearly proven its a problem for the IP Office (though going beyond the stated maximum setting its not entirely fair to call it a problem when the system doesn't react well). But there's no need for the rest of us to join in and bang our own heads against the wall. Tell the SIP provider that you've set the IP Office to its maximum range (1024 to 65530).
 
@sizbut "Tell the SIP provider that you've set the IP Office to its maximum range (1024 to 65530)"
Have you tested this range to ensure it does not corrupt the SD card similarly?

Further, your suggestion violates manager's condition "RTP Port range should not overlap 50751-50850" , which clearly you didn't know and nor did I or the doc...

You also misunderstood @derfloh 's explanation that the wide RTP range is for the ingress by the provider thru the firewall, not the IP Office which should be narrow. The manager help file states "
that only port numbers between 49152 and 65535 are used, that being the range defined by the Internet Assigned Numbers Authority (IANA) for dynamic usage.", contradicting it's own checks and default lower range... Obviously I'm not an expert and learning, but to me, there's a lot of inconsistency.

1726609028993.png

Hopefully my finding helps anyone who runs across the "System not running system SD primary software" error and know that it can be caused by a configuration issue.
 
"Have you tested this range to ensure it does not corrupt the SD card similarly?" - Repeat. NO. My systems are running perfectly fine on a sensible port range for the expected call capacity.

Yes, having such a wide range will cause overlaps with other potential uses by the IP Office

And don't get me started on the difference between "should not" (a recommendation) and "must not" (an absolute) - read some SIP RFCs and you'll find they are well defined terms.


Name the "major Canadian carrier" - there will more than likely be someone else here who uses them and can say what port range they used, or even Avaya application notes for that carrier. I don't see any reason to protect their identity.
 
Last edited by a moderator:
"The wide RTP range is for the ingress by the provider through the firewall"

No, the ports the provider uses is for the EGRESS through the firewall. Inbound (Ingress) ports are the ones the IP Office uses (40675-50754 UDP). Simple mistake but can have significant impact if the provider ports don't overlap the IP Office ports.
 
It's not a problem to have ports 60000-60100 even if the provider uses 50000-59999.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top