Alarming Number Of Passport 8100 Crashes

dane775 · Jan 19, 2005

We have a customer who has been experiencing a large number of Passport 8100 crashes & partial crashes the last 6-8 months. They're running 3.2.2.2 code and most of them have 64M of RAM. We're in the process of upgrading to 3.5.5.0/256M RAM but have only done a couple so far.

The boxes all have x2 8190 CPUs, an 8148TX card in slot one and an 8108GB in slot 7. Some are loaded the rest of the way with 8148TXs...other's have no more than 4 cards (slots 1,5,6,7).

The symptoms at these sites have "roughly" followed one of two patterns: CPU lockups - whole switch is down...or disappearing cards - doesn't usually kill the whole switch.

We've had several cases with anywhere from 4-10 cards in a chassis and when you look at it via the Cli or JDM it only shows the CPUs present. In some cases we're accessing the switch through cards that aren't showing up. That problem is usually fixed by reseating or replacing the card in slot 1...(Slot 1 or 2 is the timing source for the chassis. I'm beginning to think there may be issues with 8148's providing timing).

The CPU lockups are usually fixed by switching over to the standby. Sometimes we have to boot or power clear the switch. A day...two days...a week later the other CPU will lockup. We've had situations where we've swapped both CPUs and still get lockups.

We've looked for viruses, done "very verbose" CPU traces and Sniffer traces...we haven't spotted any suspicious traffic patterns. The forwading databases aren't huge...CPU utilization's staying low. We've been round and round with Nortel but nothing's been isolated. In one case they had us swap a chassis and a week later it crashed again.

Is anyone else experiencing similar problems with the type of hardware/code we're running? Or just problems that are difficult to isolate with PP8100s? Any ideas??

netmanrick · Jan 20, 2005

What about your power input ? Check the power receptacles for tight connections;VERIFY your neutral and ground are good all the way to the box ? What about psu's in the 8100 ?
load/capacity ?
Remembet these are switching power supplies. Poor load planning/capacity has bitten many a Sysadmin.

In 30+ years of field service many a problem can be laid at
the doorstep of facilities problems.
NOTE: Use a LICENSED electrician familiar with FIPS standards(Federal Information Processing Standards).

Rick Harris
SC Dept of Motor Vehicles
Network Operations

dane775 · Jan 20, 2005

That's excellent advice and something we were just considering during a recent impromptu tech session.

We were considering going the cheap route and putting AC monitors on line. Basically fumble about ourselves for a while to see if we could figure it out.

I think the severity of this problem dictates we spend a few bucks and get an electrician out there.

Thanks much...

jdoebling · Jan 28, 2005

We have 2 Passport 8600's that are exhibiting similar symptoms. Nortel has not been much help. We have replaced all 4 switch fabrics with no difference. They say the "problems may be traffic based", but I doubt it.

jdoebling · Mar 10, 2005

Nortel's Rapid Response team paid us a visit!

After examination they found problems with Spanning Tree being turned on where it should not have been.

They saw our old Netbios VLANS (which we used to need with Windows 3.x and 95) - this creates LOTS of broadcasts that the 8600 does not like.
They found an old SNA VLAN that we didn't need anymore.
They upgraded our code to 3.3.5
They checked our VRRP configuration to separate it into its own VLAN.

When these things were identified and corrected, we have had NO lockups since! It seems like a lot of little things add up on the passports to cause it trouble. Once they were re-configured, all is well. We also found that excessive broadcasts are not our friends!

dane775 · Mar 10, 2005

Thanks for the response, got notified by email *smile*

Broadcast storms can certainly cause problems, but the Passports shouldn't lock up because of them. The CPU has to get involved any time there are broadcasts and can certainly benefit from a cleaned up network. I still think some of the things I've seen are hardware/code deficiencies, we've looked to closely at the traffic. CPU protection from excessive broadcasts swamping the CPU were added after 3.2.2.2 code. That probably had almost as much, if not as much, to do with stopping the lock ups.

Thanks again, still fighting issues.

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Alarming Number Of Passport 8100 Crashes

dane775

Technical User

netmanrick

Technical User

dane775

Technical User

jdoebling

IS-IT--Management

jdoebling

IS-IT--Management

dane775

Technical User

Similar threads

Part and Inventory Search

Sponsor