Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations dencom on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Alarming Number Of Passport 8100 Crashes

Status
Not open for further replies.

dane775

Technical User
Oct 28, 2004
151
CA
We have a customer who has been experiencing a large number of Passport 8100 crashes & partial crashes the last 6-8 months. They're running 3.2.2.2 code and most of them have 64M of RAM. We're in the process of upgrading to 3.5.5.0/256M RAM but have only done a couple so far.

The boxes all have x2 8190 CPUs, an 8148TX card in slot one and an 8108GB in slot 7. Some are loaded the rest of the way with 8148TXs...other's have no more than 4 cards (slots 1,5,6,7).

The symptoms at these sites have "roughly" followed one of two patterns: CPU lockups - whole switch is down...or disappearing cards - doesn't usually kill the whole switch.

We've had several cases with anywhere from 4-10 cards in a chassis and when you look at it via the Cli or JDM it only shows the CPUs present. In some cases we're accessing the switch through cards that aren't showing up. That problem is usually fixed by reseating or replacing the card in slot 1...(Slot 1 or 2 is the timing source for the chassis. I'm beginning to think there may be issues with 8148's providing timing).

The CPU lockups are usually fixed by switching over to the standby. Sometimes we have to boot or power clear the switch. A day...two days...a week later the other CPU will lockup. We've had situations where we've swapped both CPUs and still get lockups.

We've looked for viruses, done "very verbose" CPU traces and Sniffer traces...we haven't spotted any suspicious traffic patterns. The forwading databases aren't huge...CPU utilization's staying low. We've been round and round with Nortel but nothing's been isolated. In one case they had us swap a chassis and a week later it crashed again.

Is anyone else experiencing similar problems with the type of hardware/code we're running? Or just problems that are difficult to isolate with PP8100s? Any ideas??
 
What about your power input ? Check the power receptacles for tight connections;VERIFY your neutral and ground are good all the way to the box ? What about psu's in the 8100 ?
load/capacity ?
Remembet these are switching power supplies. Poor load planning/capacity has bitten many a Sysadmin.

In 30+ years of field service many a problem can be laid at
the doorstep of facilities problems.
NOTE: Use a LICENSED electrician familiar with FIPS standards(Federal Information Processing Standards).

Rick Harris
SC Dept of Motor Vehicles
Network Operations
 

That's excellent advice and something we were just considering during a recent impromptu tech session.

We were considering going the cheap route and putting AC monitors on line. Basically fumble about ourselves for a while to see if we could figure it out.

I think the severity of this problem dictates we spend a few bucks and get an electrician out there.

Thanks much...
 
We have 2 Passport 8600's that are exhibiting similar symptoms. Nortel has not been much help. We have replaced all 4 switch fabrics with no difference. They say the "problems may be traffic based", but I doubt it.
 
Nortel's Rapid Response team paid us a visit!

After examination they found problems with Spanning Tree being turned on where it should not have been.

They saw our old Netbios VLANS (which we used to need with Windows 3.x and 95) - this creates LOTS of broadcasts that the 8600 does not like.
They found an old SNA VLAN that we didn't need anymore.
They upgraded our code to 3.3.5
They checked our VRRP configuration to separate it into its own VLAN.

When these things were identified and corrected, we have had NO lockups since! It seems like a lot of little things add up on the passports to cause it trouble. Once they were re-configured, all is well. We also found that excessive broadcasts are not our friends!
 

Thanks for the response, got notified by email *smile*

Broadcast storms can certainly cause problems, but the Passports shouldn't lock up because of them. The CPU has to get involved any time there are broadcasts and can certainly benefit from a cleaned up network. I still think some of the things I've seen are hardware/code deficiencies, we've looked to closely at the traffic. CPU protection from excessive broadcasts swamping the CPU were added after 3.2.2.2 code. That probably had almost as much, if not as much, to do with stopping the lock ups.

Thanks again, still fighting issues.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top