Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

PP8600 problem - fiber blade stops responding

Status
Not open for further replies.

pjevnisek

IS-IT--Management
Sep 8, 2005
63
CA
I've been having a problem with the fiber blade on the PP8600 which stops responding for no apparent reason. This first happened a couple of months ago. It didn't happen again until about a week ago and since its happened about 3 times. When I check the logs I get an error messaging saying "HW ERROR FAD Mis-Align Detected, SWIP Reset Status=8 ..." Anyone have any idea what is causing this and/or how to fix it? Your input would be much appreciated.
 
SWIP resets are not that big of a deal and are likely not the cause of your fiber card not responding. However, the resetting of the card could cause the SWIP reset errors, or the SWIP reset errors being an occurance when the blade stops responding.

If your card is taken off line though, you will want to look for these in the log: "ERROR Task=tChasServ Reset TmuxFailed on slotX Card taken off-line" If you see those, then definately the card could be defective and need replaced.

Here is an explanation of the errors.

Each I/O module is connected via a high-speed back plane bus to a Switch Fabric on the CPU SF module. All ingress and egress traffic, even if its contained on the same I/O module port, passes across the high-speed back plane bus through the Switch Fabric. To guard against data bit errors, the CPU software continuously monitors the data integrity between I/O modules and CPU Switch Fabric.

If an anomaly/error is detected, it could propagate a data error into the Switch Fabric, which could compromise the integrity of the egress traffic. In the event an anomaly/error is detected between an I/O module and Switch Fabric, the CPU software closely monitors all egress traffic for the next six seconds. If no errors are detected, the software will continue on as normal.

However, if an error is detected the software will reset the SWIP (Switch Fabric ASIC) and all TapMux ASICs (Application Specific Integrated Circuits), and will log:"WARNINGTask=tChasServ FAD Mis-Align detected, SWIP Reset Status=8" and/or "HwCheck: Fad CRR Failed, Reset swip". And then the software will continue on as normal. If a TapMux does not reset properly, i.e., a hardware failure, the software will take that particular I/O module off line.

In this case, even if the card is physically connected, it won't be connected to the Switch Fabric anymore. This means that this card won't be connected to the back plane and will be out of services until you reset it, and will log: "ERROR Task=tChasServ Reset TmuxFailed on slotX Card taken off-line". And then the software will continue on as normal. In summary, this type of error is generated by software when a data bit error is detected between I/O modules and CPU switch fabric. This kind of error is quite common and should not be an issue unless it is happening often (multiple times per day, often enough to impair the function of the switch) and not related to someone removing a blade.

Like all Ethernet devices, the 8600 monitors the data integrity passing through it. All traffic goes through the Switch Fabric, sometimes called the "HyperPhy Link." When multiple problems are detected over a six second period, it resets the SWIP, (which means the Switching Fabric ASIC) and the TapMux ASICs, giving the "FAD Mis-Align" error. An occasional reset does negligible performance issues to the network, but a lot of these could. Usually look for 10 in one day, or 3 in a two minute period, as a problem that could need troubleshooting.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top