Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Westi on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

E1 PRI's dropping (reloading) unexplained

Status
Not open for further replies.

MitelInMyBlood

Technical User
Apr 14, 2005
1,990
US
SX2K running 34.2.7.4
E1 (Euro) PRI (MC269AA) running QSIG periodically resetting (card reloading).
Have already tried:
- 3 new MC269AA cards behave similarly
- Physically moved the MC269AA to a different DSU cabinet (eliminating fiber, FIMs, backplane & power issues
- problem occurring on both planes every few hours, 8~10 hours apart, sometimes more frequent
- another T1 (PSTN) Pri in the next slot over (different links) is unaffected.

Upstream side of these E1 PRI circuits is an Etrali Turret System (trade floor). I've been fighting this for 2 days and growing weary, as is client. Here's a dump from the SOFTWARE log:
Code:
2011-NOV-13 07:22:32 SOFTWARE    B/Active             -Warning-  Detm7     #1151
MSGmgr : Message link from plane 1 to plane 0 for controller 32
is scanning due to cause value 3
 
 
2011-NOV-13 05:25:53 SOFTWARE    B/Active             -Warning-  Detm7     #1150
DAM audit recovered trk swid 397
     at cab = 8 shelf = 1 slot = 4 circ = 60
     in state 5 owned by non_existent
     process hex 51DC application 1

Maintenance logs show the card was taken out of service.
It does recover on its own, but I'm beginning to wonder if the problem could be upstream on the Etrali system.

Ideas welcome. Please read what I've tried before making suggestions. Thanks

Original MUG/NAMU Charter Member
 
Here is the corresponding MAINTENANCE log:
Code:
2011-NOV-13 07:22:32 MAINTENANCE B/Active             -Info-     Detm7     #0858
The link from MC215AD Main Controller IIIE at location 01 1 07 Active
         to   MC269Ax Universal E1 at location 08 1 04
is now in SCANNING mode.
 
2011-NOV-13 07:22:32 MAINTENANCE B/Active             -Info-     Detm7     #0857
MC269Ax Universal E1 at location 08 1 04 removed

PCM Totals show no PCM links having problems
MESS SUB shows nothing you wouldn't expect - ie, in SCAN mode while the card is reloading

DBMS STATE indicates no database errors.
System is switching activity nightly, with no abnormalities observed.


Original MUG/NAMU Charter Member
 
I've seen similar to this on 3300's with a PRI link with a huge amount of errors on it.....have you a way of doing a bit error test on the PRI link?
 
what are the DTSTAT readings for the PLID over 24 hour period?
 
have you tried changing the hybrid connector on the DSU?
I have had these fail in the past
I take it you have changed the cables
Also you have said you have changed the cards does the firmware on the card match the software on the 2k?
You may have to either upgrade or downgrade to match
Lastly what is your clock source is this a reliable source e.g. BT Dass etc.

Share what you know - Learn what you don't
 
For Bobcheese:
(this span actually dropped at 9:02 this morning w/no indication of D-channel issues.)

Code:
Universal E1          8 1  4  1
Link is Available
         duty                             bit
         cycle       framing              error
Time     (%)         losses     slips     rate
------------------------------------------------
 9:19    100            0           0         0
 9:00    100            0           0         0
 8:00    100            0           0         0
 7:00    100            0           0         0
 6:00    100            0           0         0
 5:00    100            0           0         0
 4:00    100            0           0         0
 3:00    100            0           0         0
 2:00    100            0           0         0
 1:00    100            0           0         0
 0:00    100            0           0         0
23:00    100            0           0         0
22:00    100            0           0         0
21:00    100            0           0         0
20:00    100            0           0         0
19:00    100            0           0         0
18:00    100            0           0         0
17:00    100            0           0         0
16:00    100            0           0         0
15:00    100            0           0         0
14:00    100            0           0         0
13:00     97            4           6         0
12:00    100            0           0         0
11:00    100            0           0         0
10:00    100            0           0         0
------------------------------------------------

For Supernova99:

Look at my previous - this was a total/complete hardware move from one DSU cabinet to another, thus eliminating the fibers, the fims, the DSU power supply and all hardware associated with the DSU node. Sorry if I didn't include the fact that the DSU interface was brand new as well. The MC269AA card is also new & the configuration was copied to it with IMAT. We'vs also tried other MC269AA cards from inventory.

The card is not failing immediately, it's taking anywhere from 3 hours to 24 hours with it running fine, suddenly we'll see the following in the Maint log:
Code:
2011-NOV-14 09:04:52 MAINTENANCE A/Active             *Major*    Detm7     #0042
The link from MC215AD Main Controller IIIE at location 01 1 07 Inactive
         to   MC269AA Universal E1 at location 08 1 04 Active
is now OPEN.
 
2011-NOV-14 09:04:52 MAINTENANCE A/Active             *Major*    Detm7     #0041
The link from MC215AD Main Controller IIIE at location 01 1 02 Active
         to   MC269AA Universal E1 at location 08 1 04 Active
is now OPEN.
 
2011-NOV-14 09:04:52 MAINTENANCE A/Active             *Major*    Detm7     #0040
MC269AA Universal E1 at location 08 1 04 installed
 
2011-NOV-14 09:03:49 MAINTENANCE A/Active             *Major*    Detm7     #0039
---------------+--------+-------+-------+-------+---------+---------+----------
               |Total In|      Unavailable      | Alarm Threshold Percentages
   Category    | System | Total |    %  | Alarm |  MINOR  |  MAJOR  | CRITICAL
---------------+--------+-------+-------+-------+---------+---------+----------
Trunks         |   219  |    60 |  27 % | Major |    -    |    10 % |   100 % |
DSU  msg link  |    12  |     4 |  33 % | Major |    -    |    30 % |    -    |
---------------+--------+-------+-------+-------+---------+---------+----------
 
 
2011-NOV-14 09:03:49 MAINTENANCE A/Active             *Major*    Detm7     #0038
                                ------------
Current System Alarm :          |  MAJOR   |       viewed from Active
                                ------------
 
 
 
2011-NOV-14 09:03:42 MAINTENANCE A/Active             -Info-     Detm7     #0037
The link from MC215AD Main Controller IIIE at location 01 1 07 Inactive
         to   MC269Ax Universal E1 at location 08 1 04
is now in SCANNING mode.
 
2011-NOV-14 09:03:42 MAINTENANCE A/Active             -Info-     Detm7     #0036
The link from MC215AD Main Controller IIIE at location 01 1 02 Active
         to   MC269Ax Universal E1 at location 08 1 04
is now in SCANNING mode.
 
2011-NOV-14 09:03:42 MAINTENANCE A/Active             -Info-     Detm7     #0035
MC269Ax Universal E1 at location 08 1 04 removed

I believe that the upstream system (ETRALI Turret System) may be the culprit, but over the weekend we swapped the interface cards in that POS as well. Still occurring unpredictably.

Adding further fuel to the fire, there is a redundant path to the Turret system from another Mitel SX2000 (separate PBX) and that path is not failing, thus pointing back at the original PBX.






Original MUG/NAMU Charter Member
 
It just dropped traffic again (few mins ago) Only FYI here is what Enterprise Manager is sending us:

Code:
TIME:Mon Nov 14 09:53:56 CST 2011

Alarm ID :	 277978
Network Element name:	 Detm7 IP Address:10.248.118.34 
Severity:		 Major 
Creation Time:		 Mon Nov 14 09:53:56 CST 2011 
Location:		  
Message: 
Alarm:  MAJOR (Level 3)
	 - Cat: DSU  msg link
	 - Avail resources: 12
	 - Unavail resources: 4
	 - Minor Thresh: 101
	 - Major Thresh: 30
	 - Critical Thresh: 101
TDM PBX: 2011 11 14, 09:53:55

The card DOES recover on it's own after several minutes.


Original MUG/NAMU Charter Member
 

Never mind.
The Etrali Turret system has been shut down and replaced with Cisco.

This in fact was the long term plan but was accelerated due to the recent occurrence of the E1 spans dropping unexpectedly several times a day.

Original MUG/NAMU Charter Member
 
Wow that's harsh.

Sorry for your pain, both before and after.

**********************************************
What's most important is that you realise ... There is no spoon.
 
We broke our balls working to find a root cause and solution and the end-user bore with us longer than you should reasonably expect them to. The only thing that didn't get swapped was the Main Control & we didn't because the issue was seen on both planes. We knew the iron was getting tired and a replacement plan for the Turret System had been on the drawing board for the past year. The drives on the 2K are fast approaching 110,000 hours. This system presently has 4 separate main controls in the cluster with expanded per nodes (plural rackfulls of per nodes and DSUs) and 100% -48 DC powered. Going to make someone a fine ship's anchor soon. :)

Original MUG/NAMU Charter Member
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top