Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Passport 8300 Random Fail-Over problem

Status
Not open for further replies.

Googer

Technical User
Apr 30, 2004
60
US
We have two brand new Passport 8300 switches fully loaded with 8324GTX(10/100/1000) cards. We are running the 2.0.0.1 version of code on these switches. We are running a combination of 100MB Full and 1000MB Full to our servers on these switches. We have both of these unlinked to a Split MLT core of two Passport 8600 switches. The problem we are having is that the Passport 8300s will occasionally execute a Fail-Over from one switch fabric to the other with no apparent reason to be found. This happens no matter which switch fabric is primary. I have tried replacing the switch fabric modules on one of these switches and the problem is still occurring. I have also replaced multiple Ethernet cards on this switch with no result. The only thing I see in the logs is that switch has put the active switch fabric card into warm standby and that the standby card has come online as the primary, which of course drops all my servers’ offline for 60-120 seconds and sets off every monitoring alarm we have. I have been working with Nortel support but as of yet they have been of no help what-so-ever. Has anyone else seen this problem? Does anyone know if there is a fix for this?

P.S. We can not upgrade to the newly released 2.1 version of code because that is not covered as part of the warranty.

Any help would be appreciated.

Googer
 
there is a bug with silent reset, one of his should have been fix'ed in the latest release.

But there is also a way to fix this reset with change some core values in the kernel.

I got info if you need.


pb
 
We are working stable at this point. We did get the 2.1 version of software but Nortel also applied a memory fix which entailed setting a variable in privledged mode. This seems to have fixed the problem.

Thanks,

Googer
 
PederChr,

Is that for the 8600 or 8300? We have a Nortel Engineer on site and last night he had a CPU failover happen while he was sitting there. If you have some additional info on these core setting please let us know. We have this issue taking down our whole data and Nortel CSE1000 VOIp phone system when these CPU's fail over for no reason. It forces the phones to re- register which means any call gets cut off. People just dont expect their phones to "reboot". Our users are pissed off...
 

This was for the 8300, but you got problem with 8600 ?


The lates PP8300 bug was a service supposed to check memory utilization. So we have taken this this routine inactive, then. (if someone need a case ref i got it)



Btw..
What code do you run ?
 
We have two 8600's currently and have two 8300 coming in to take the direct attached load off the 8600's. Our 8600's have had 4 random CPU fail overs, including the one last night with a Nortel engineer sitting right in front of it. We have to get our core network stable and they agreed to bring in the 8300's to offload everything off the 8600's so they can flap CPU's and let SMLT do its job until they can get some kind of handle on the CPU issue. PederChr, do you have 8600's tied togather with IST trunks? If so do you have issues with them?
 
Is the CPU a 8690 with upgraded memmory ?
I know there have bee a CSB on memmory upgrade.

We don't have any problem with 8600 today, but we run 3.7.1
but not with LACP..

If you get problem with 8300- the case is 040907-80733.
 
PederChr,

When you say no problem with 8600 do you have more than 1 and are they tied togather via IST? My hunch is they CPU swaps are IST related and so does our onsite Nortel guy.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top