Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

500v2 8.0.44 reboot issue 4

Status
Not open for further replies.

Signo

IS-IT--Management
Oct 5, 2006
141
US
Hello everyone,

I have a system that has been in for about 3 months an has had constant issues. The issue I have is that the customer calls and says that all the phones are stuck, with either a "blank white" or I have seen them stuck on please wait or one moment please cant remember exactly. I have noticed that it seems like a power issue. The system is on a battery backup, but it does not seem to matter. Its like it rebooted and got stuck. I can pull the battery off the outlet and the system stays up and running fine.

The commercial power does go out regularly, and this only happens occasionally but the customer is naturally upset I have been out multiple times on a new system.

I think it is power related, but the system cant recover correctly from losing power. If you guys have some ideas I would love to hear them.

Thanks


 
So My customer called me this morning with all the phones stuck on please wait again. Had a little bit of rain and a crack of thunder or two nothing I normally would worry about.

All 9508 were stuck on please wait and the system never seemed to lose power, I am going to go read the monitor trace I had running, hopefully it caught something.

I am going to put on a 1416 and see if it dies out when the 9508's get stuck or what it behavior is when this happens.

I have never seen anything in 15 years of this kind of work that is this sensitive.

Anyone think it just the 9508’s I know there have been a bunch of issue’s with them, anyone think this is just another one? All the lockup’s that I see posted are random phones not all at one time, so it’s a different scenario.

Any ideas on this one?
 
I have to go and grab the logs later today. I let it run a little longer today as I expect another storm. Monitor did not show anything really. One of the employees was on the phone when it quit. She told me she heard a crack of thunder then her call was gone and the phones were stuck on please wait. I can see the call end and none of the phones show disconnecting but they are all stuck on please wait. Its like the phones reset but the system never detected it, so it would not let them reconnect. I put a 1416 out there today so see how it acts. I will grab the trace on the way home today and post it.

Today we ran a complete new ground back to the main panel because we could not see the ground the whole way through the attics. Now we know it is bonded directly to the power ground.

We have lost so much on this job its become a office joke that I am going to be getting a desk there.
 
I would bet if the employee heard a crack of thunder before the "please wait" that lightening hit the wiring going between the buildings causing it to zap the system. It may not have lost power, but could cause the phones to reset.
 
Even within a few hundred yards and longish cable runs it can induce hundreds of volts into the cable :)

 
How many phones are in the remote building? I know it's not convenient, but is it possible to unplug those phones for a few days and see if the issue happens with that wiring taken out of the mix? Easier said than done of course, but it could narrow down your culprit pretty quickly.
 
That's what we think. You can't stop lightning, no matter how hard you try. But it's constant, and the Nortel it replaced never had this much trouble, but they were hardy suckers. Same wiring just a direct swap and no other systems in the area suffer this problem.

I know each case if different but every storm makes me think something specific is drawing it towards the system. Going to check wiring all over tomorrow. Maybe something is laying on something electrical.

It's got me pulling my hair out, lucky the customer is being understanding
 
I may isolate that building over the weekend if the weather is going to be bad. We were thinking the same thing, process of elimination is all we can do.
 
Here is the log for the system when it quit. I removed some customer info and cut the two log files into one. There is a call active around 5 minutes before 8am and it shows the call as active but the customer said the phones all went please wait while on this call. In that time frame I don't see anything in the log that shows the phones quieting the only things I see is the system restarting after we instructed the customer to pull the power and plug back in. Its a lot of info and I have never really read these.

If anyone knows more or sees anything please chime in.
 
 http://www.mediafire.com/view/?3b00be3169ihbbq
I'm gonna go with electrical ghost in the machine as the reason on this one. There should be all kinds of protection setup on the cabling for the remote building extn's on the digital and/or analog stations on both sides of the termination with both primary and secondary protection on the wiring and grounded properly. All that protection would've needed to be replaced if the building took a direct shot of lighting that fried everything. I'm assuming this has all been done? You could look at putting a zero surge on the IP office proc. (expensive, but it's good for stormy areas with sensitive equipment) Then again, it could be a ANOTHER discovered 9508 'feature' from Avaya. What release and firmware is the IP500 and the 9508 using? Good luck! :)
 
It is clear to me:

Code:
********** SysMonitor v10.0 (44) [connected to 10.0.0.204 (Sample)] **********
 487676749mS PRN: Monitor Status  IP 500 V2 8.0(44)
 487676749mS PRN: LAW=U PRI=0, BRI=0, ALOG=8, ADSL=0 VCOMP=20, MDM=0, WAN=0, MODU=0 LANM=0 CkSRC=0 VMAIL=1(VER=2 TYP=3) CALLS=0(TOT=957)
 487794641mS RES: Tue 14/8/2012 08:17:15 FreeMem=61424224(2) CMMsg=6 (8) Buff=5200 953 1000 7463 5 Links=4297
 487794642mS RES2: IP 500 V2 8.0(44) Tasks=43 RTEngine=0 CMRTEngine=0 ExRTEngine=0 Timer=45 Poll=0 Ready=0 CMReady=0 CMQueue=0 VPNNQueue=0 Monitor=1 SSA=0 TCP=14 TAPI=0 ASC=1 SYS=MNTD OPT=UMNT SDSPD=2034
 487800641mS RES: Tue 14/8/2012 08:17:21 FreeMem=61425088(2) CMMsg=6 (8) Buff=5200 954 1000 7463 5 Links=4299
 487800641mS RES2: IP 500 V2 8.0(44) Tasks=43 RTEngine=0 CMRTEngine=0 ExRTEngine=0 Timer=46 Poll=0 Ready=0 CMReady=0 CMQueue=0 VPNNQueue=0 Monitor=1 SSA=0 TCP=14 TAPI=0 ASC=1 SYS=MNTD OPT=UMNT SDSPD=2034
 488097141mS RES: Tue 14/8/2012 08:22:17 FreeMem=61420772(2) CMMsg=6 (8) Buff=5200 953 1000 7463 5 Links=4272
 488097141mS RES2: IP 500 V2 8.0(44) Tasks=43 RTEngine=0 CMRTEngine=0 ExRTEngine=0 Timer=45 Poll=0 Ready=0 CMReady=0 CMQueue=0 VPNNQueue=0 Monitor=1 SSA=0 TCP=14 TAPI=0 ASC=1 SYS=MNTD OPT=UMNT SDSPD=2034
 488103141mS RES: Tue 14/8/2012 08:22:24 FreeMem=61425088(2) CMMsg=6 (8) Buff=5200 954 1000 7463 5 Links=4299
 488103141mS RES2: IP 500 V2 8.0(44) Tasks=43 RTEngine=0 CMRTEngine=0 ExRTEngine=0 Timer=46 Poll=0 Ready=0 CMReady=0 CMQueue=0 VPNNQueue=0 Monitor=1 SSA=0 TCP=14 TAPI=0 ASC=1 SYS=MNTD OPT=UMNT SDSPD=2034
 [b]488255499mS PRN: WARNING:
 488255499mS PRN: POWER FAILED!! TASKname Daemon f0005fd4 fff03124 f00cd230 00000000 f02d6ce4 f015685c f0168950 f02e5e9c f02e5e48[/b]
 
 488255499mS PRN: 
 488255499mS PRN: WARNING:
 488255499mS PRN:  
 488255499mS PRN: 
 488255499mS PRN: WARNING:
 488255499mS PRN: interrupt disabled, 488255499
 488255499mS PRN: 

********** contact lost with 10.0.0.204 at 08:25:08 14/8/2012 - reselect = 3 **********
******************************************************************

********** SysMonitor v10.0 (44) **********

********** contact made with 10.0.0.204 at 08:26:20 14/8/2012 **********

********** System (10.0.0.204) has been up and running for 23secs(23736mS) **********

********** Warning: TEXT File Logging selected **********


********** Warning: TEXT Logging to File STARTED on 14/8/2012 08:26:20 **********
     23736mS PRN: Monitor Started IP=10.0.0.89 IP 500 V2 8.0(44) Sample
                  (IP Office: Supports Unicode, System Locale is enu)
     23737mS PRN: LAW=U PRI=0, BRI=0, ALOG=8, ADSL=0 VCOMP=20, MDM=0, WAN=0, MODU=0 LANM=0 CkSRC=0 VMAIL=0(VER=2 TYP=3) CALLS=0(TOT=3)
     24158mS RES: Tue 14/8/2012 08:26:20 FreeMem=64730372(2) CMMsg=3 (5) Buff=5200 966 1000 7463 5 Links=8816
     24159mS RES2: IP 500 V2 8.0(44) Tasks=37 RTEngine=0 CMRTEngine=0 ExRTEngine=0 Timer=48 Poll=0 Ready=1 CMReady=0 CMQueue=0 VPNNQueue=0 Monitor=1 SSA=0 TCP=14 TAPI=0 ASC=1 SYS=MNTD OPT=UMNT SDSPD=2034
     24680mS PRN: Config Write Wake Up
     24681mS FILESYS: FileSysTask: Started


BAZINGA!

I'm not insane, my mother had me tested!

 
That is the power fail from the customer pulling the power to reset it, that part is very clear to me as well, unless you are thinking something else.

There is protection on both ends also, the grounding of the chassis was not proper, but has been fixed since. We had a decent storm roll through for about an hour tonight, no emails from the system, but last time I didn't get any notification, I just had stuck phones. Its 2:30 am here now but am going straight there in the morning to check it. I have a 1416 on it to see how it reacted if the system did freeze up again.
 
Had a little joy from the site last night, I stormed a little and it made it through. Only time will tell now.
 
We purchased an IP office Basic system in June of 2012 and finally resolved reboot issues in October. Answer for us was to replace the chassis (shelf) and most importantly was the bad SD card. We fought all the same issues you encountered and thought it was power related. We believe there may be bad SD cards out there because the config looked good right up to 3rd level of AVAYA support.
 
8.0 (46) said it corrected a 8.0(44) system reboot issue... just sayin
 
I had a spontaneous reboot on 8.0.44 lately while i had monitor open.
I upgraded it to 8.0.46 just to be sure :)


BAZINGA!

I'm not insane, my mother had me tested!

 
I have not heard from this customer since about mid September, and I have email notifications turned on, and have only seen one reboot in the early AM when I think there electrician was making some electrical changes.

We don't even mention the name of this customer around our office. You know how it is, you say man I haven't heard from that guy in a while and they call the next day lol.

Most of the issue for me went away when it was discovered that the system was not properly bonded with the electrical ground. Its still not perfect as far as the bonding goes, but it is much better then it was.

If I have to go back for any issues I might upgrade, then again it will probably just introduce some new bug so I might not.

I came over from Nortel stuff and I wish Avaya would of just left those phones behind and stayed focused on their on products.

On a side note I would like to thank everyone on this forum for all the great suggestions and advice you dispatch, it is very much appreciated. No matter how long I work in tech I have always found my most valuable tool to be the ability to talk and rationalize things out with other techs. Places like this bring all that knowledge together and make it possible.

Thank You
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top