Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Random restarts

Status
Not open for further replies.

Alfalis

Technical User
Oct 15, 2012
250
DE
Hi everyone,

one of my customers is complaining about random restarts every 2-3 hours or so...

In System Status I get a "<WATCHDOG> addr=00000000 d=0 pc=f032b060 error". Do you guys have any idea what that means?

Here is a tiny bit of the monitor log that might help:


101332mS CD: CALL: 0.1031.0 BState=Ringing Cut=3 Music=0.0 Aend="Ippy C(325)" (9.18) Bend="Meyers Roland(151)" [Line 17] (267.4) CalledNum=151 (Meyers Roland) CallingNum=325 (Ippy C) Internal=0 Time=56 AState=Ringing
101333mS CMMap: PCG::MapBChan pcp[178]b1r0 cp_b f54a5b8c other_cp_b 0 type CGTypeSimple
101333mS CMMap: a=9.18 b=0.0 PCGS CPReserveCodec (pcp[198]b0r1) true
101334mS CMMap: PCG::MapBChan pcp[198]b0r1 cp_b f567d97c other_cp_b f54a5b8c type CGTypeSimple
101334mS CMMap: a=9.18 b=0.1 M1
101334mS CMMap: PlatformConnectionAudioSAP::ConnectVoice pcp[198]b0r1 Configure 0.1
101334mS CMMap: PlatformConnectionAudioSAP::ConnectVoice pcp[198]b0r1 ConnectIndication 0.1
102395mS H323Evt: RTP(50t): 192.168.128.10/49154 192.168.109.10/49176 CODEC=Alaw64K(4) PKTSZ=160 RFC2833=off AGE=1062 SENT=50 (avg size=160) RECV=53 (avg size=160)
104842mS RES: Thu 17/1/2013 10:24:42 FreeMem=53127288(1) CMMsg=6 (6) Buff=5200 948 1000 7351 5 Links=17098
104842mS RES2: IP 500 V2 8.0(42) Tasks=54 RTEngine=0 CMRTEngine=0 ExRTEngine=0 Timer=61 Poll=0 Ready=0 CMReady=0 CMQueue=0 VPNNQueue=0 Monitor=1 SSA=0 TCP=19 TAPI=0 ASC=1 SYS=MNTD OPT=UMNT SDSPD=2034
106070mS PRN: Read hold music HoldMusic.wav from memory card
106195mS PRN: WalkToWAV - Success
106229mS PRN: Begin Stack Trace, Task=Daemon taskaddr=f5b02150
106229mS PRN: findfunc f02e4fe8 f02e91dc f02e95f4 f0ffd60c f0ffcae0 f0ff9bfc f0ff3dc4 f00bee50
106229mS PRN: findfunc f02d6d10 f015685c f0168950 f02e5e9c f02e5e48
106229mS PRN: End Stack Trace
106229mS PRN: f02e9a54 00000000 00000000 00000000 00000000 00000000 00000000 00000000
106229mS PRN: IP 500 V2 8.0(42)
106229mS PRN: OSBuf::Alloc size=16400 TRUNCATED

********** contact lost with 192.168.128.10 at 10:24:59 17/1/2013 - reselect = 2 **********
******************************************************************

********** SysMonitor v10.0 (42) **********

********** contact made with 192.168.128.10 at 10:25:51 17/1/2013 **********


Any help is really appreciated, if you need any more information just ask.

Big thx in advance for your help!!!

Greetings from Germany :)
 
What kind of IPO?
I would upgrade it to 8.0.51 to see what happens then.
If it then still reboots that often then i would go trough the programming and see conflicting stuff is in it.


BAZINGA!

I'm not insane, my mother had me tested!

 
@CarGoSki: No I didn't. I copied all files from the SD Card to my laptop before formatting it (in case anything goes wrong - you never know..) and put the exact same file (holdmusic.wav) back onto the SD Card afterwards so no real "changing" files there..



A few days ago the restarts started but at the time System Status showed a problem with the SD Card. After formatting and recreating the SD Card it all seemed to work fine again (no more problematic errors in System Status) but 5 hours later the IPO started restarting again every few hours.


A former college of mine told me he has had the same problem before with that IPO (he left the company a few months back) but at the time shutting down the IPO and restarting it fixed the problem..

Greetings from Germany :)
 
What is the brand of this card and what is the PCS level of it?
Perhaps time to do a dongle swap.


BAZINGA!

I'm not insane, my mother had me tested!

 
tlpeter offers good advice.
SD card issues suck and can be tricky to solve.
I have had to recreate the card and blank out the nvram cfg then paste config.cfg back to the sd card to resolve before


Assuming that reading holdmusic.wav is the last thing in the traces before it reboots I would perhaps change MOH to external and reboot first.
I have always looked at the last events in the traces before failure as a guide to determine the cause of the failure.
In your case holdmusic.wav is read then poof.

 
CarGoSki, that is what i thought too but there are some scrappy first batch SD cards which go corrupt but they do not cause these watchdog errors.


BAZINGA!

I'm not insane, my mother had me tested!

 
Here is another snippet of a log just minutes before the first bit I posted, maybe it helps. Usually there are about 2-3 hours between restarts..


8696235mS CMCallEvt: 6.8.1 400 Q931 Trunk:6 CHAN=1: StateChange: END=A CMCSConnected->CMCSCompleted
8696237mS CMLineTx: v=17
CMFacility
Line: type=IPLine 17 Call: lid=0 id=2245 in=0
IE CMIEFastStartInfoData (6) 2 item(s)
8696238mS CMLOGGING: CALL:2013/01/1710:11,00:10:51,008,0015256740331,I,194,665098194,Hofmannmobil,,,0,,""n/a,0
8696239mS CD: CALL: 6.8.1 BState=Connected Cut=2 Music=0.0 Aend="Line 6" (3.5) Bend="Nass Hauke(194)" [Line 17] (267.2) CalledNum=194 (Nass Hauke) CallingNum=0015256740331 (Hofmann mobil) Internal=0 Time=660820 AState=Idle
8696239mS CD: CALL: 6.8.1 Deleted
8696239mS CMCallEvt: 6.8.1 -1 Q931 Trunk:6 CHAN=1: StateChange: END=X CMCSCompleted->CMCSDelete
8696240mS CMLineTx: v=17
CMReleaseComp
Line: type=IPLine 17 Call: lid=0 id=2245 in=0
Cause=16, Normal call clearing
8696240mS CMCallEvt: 0.2245.0 -1 H323TrunkEP: StateChange: END=X CMCSConnected->CMCSDelete
8696241mS CMCallEvt: 0.2245.0 -1 BaseEP: DELETE CMEndpoint f48f862c TOTAL NOW=9 CALL_LIST=4
8696241mS CMCallEvt: END CALL:400 (f48fadc4)
8696242mS CMCallEvt: 6.8.1 -1 BaseEP: DELETE CMEndpoint f4a7e500 TOTAL NOW=8 CALL_LIST=4
8696243mS CMMap: PCG::UnmapBChan pcp[99]b1r0 cp_b f5623418 other_cp_b f567b028
8696243mS CMMap: a=0.17 b=3.5 M0
8696244mS CMMap: PCG::UnmapBChan pcp[738]b0r1 cp_b 0 other_cp_b 0
8696244mS H323Evt: RTP(END): 192.168.128.10/49152 192.168.109.10/49156 CODEC=Alaw64K(4) PKTSZ=160 RFC2833=off AGE=660767 SENT=33031 RECV=33026 RTdelay=0 jitter=0 loss=0 remotejitter=0 remoteloss=0
8696245mS CMMap: a=0.17 b=0.0 Mapper::FreeCodec freed CMRTVocoder resource busy 3, total 32
8696248mS H323Evt: v=0 stacknum=17 State, new=ReleaseReq, old=Active id=2245
8696248mS H323Evt: v=0 stacknum=17 State, new=NullState, old=ReleaseReq id=2245
8703293mS RES: Thu 17/1/2013 10:22:14 FreeMem=52302052(1) CMMsg=11 (11) Buff=5200 946 1000 7351 2 Links=15743
8703293mS RES2: IP 500 V2 8.0(42) Tasks=53 RTEngine=0 CMRTEngine=0 ExRTEngine=0 Timer=62 Poll=0 Ready=0 CMReady=0 CMQueue=0 VPNNQueue=0 Monitor=1 SSA=0 TCP=22 TAPI=1 ASC=1 SYS=MNTD OPT=UMNT SDSPD=2034
8707108mS H323Evt: Shared tcp socket for line 18 disconnected
8708175mS PRN: Begin Stack Trace, Task=Daemon taskaddr=f5b02150
8708175mS PRN: findfunc f02e4fe8 f02e91dc f02e95f4 f0ffd60c f0ffcae0 f0ff9bfc f0ff3dc4 f00bee50
8708175mS PRN: findfunc f02d6d10 f015685c f0168950 f02e5e9c f02e5e48
8708175mS PRN: End Stack Trace
8708175mS PRN: f02e9a54 00000000 00000000 00000000 00000000 00000000 00000000 00000000
8708175mS PRN: IP 500 V2 8.0(42)
8708175mS PRN: OSBuf::Alloc size=16401 TRUNCATED

********** contact lost with 192.168.128.10 at 10:22:33 17/1/2013 - reselect = 1 **********

Greetings from Germany :)
 
There is a H.323 line and i think this is causing the reboots.
Are you absolute 100% sure that there are no duplicate names and numbers on this SCN?

BAZINGA!

I'm not insane, my mother had me tested!

 
There is a H.323 line and i think this is causing the reboots.
Are you absolute 100% sure that there are no duplicate names and numbers on this SCN? "


No indeed I'm not - the system worked fine until last week when part of my customers company moved to another location (3-way SCN and some went from A to B..)
Ever since (the local administrator managed the relocation and the configuration of the IPO) we are having this problem but I haven't really had the time to check the configuration changes my customers admin made..

It might very well be that he made a mistake and we have some duplicates somewhere... (deleting users from one location and adding it to the other)


What possible duplicates could be causing this reboot issue? Duplicate users? Or H.323 lines? Any suggestion is welcome :)

Greetings from Germany :)
 
Open system status and go to the directory section.
It will show duplicates in there.
The trace shows shared H.323 trunk 18 so i think the site with that trunk is causing the issue.


BAZINGA!

I'm not insane, my mother had me tested!

 
@tlpeter: So are you talking about duplicate users or some sort of IP address conflicts?

Just as additional information in case it helps any further: Each of the 3 sites/IPOs has two H.323 lines connecting to the other two so we are having a kind of "triangle connection"..

Greetings from Germany :)
 
I see calls in IP line 17 and a disconnection on Line 18.

You say you are configured for mesh networking in IPO but do you have the site ip trunking configured for mesh.
If not then you are better off configuring for star and removing the mesh programming.
Also be sure to follow SCN rules carefully where there are no duplicate names or extension number across all three site.
Also be sure that each IP Line has a differ number across all three sites.
Meaning that IP line 17 can only exist at one location.

 
It can be anything.
It is up to you to indentify the problem :)
It could be anything and a duplicate IP can be the problem but i doubt that.
But never asume, be sure that it is ok.
I guess it is a user/group name or number giving this problem.


BAZINGA!

I'm not insane, my mother had me tested!

 
@CarGoSki: Thanks for your answer :)
But I don't think that's the problem as my ex-college (the one I mentioned before and that has several years of experience more than me^^) installed the three IPOs and they all worked fine for about one year. It's just since last week as the local admin moved users from one IPO to another that the problem came up and I'm relatively sure he was not as lightheaded as to touch any of the settings except for the users and extensions..

Greetings from Germany :)
 
Use scn discovery and make sure validation is turned on in Manager. Connect to all 3 systems. It may take awhile to load but the error pane will show user and group conflicts.
 
@1043: Thanks, I will definitely try that! :)

Greetings from Germany :)
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top