Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

EAPS - Fail-timer-exp flag set. Domain state: Complete

Status
Not open for further replies.

mikeld

Technical User
Feb 8, 2006
2
DE
No matter whether it is XOS Version 12.4.1.7 v1241b7-patch1-7 nor EW-Version 7.8e.2.1, at a certain time, mainly between 12.00 to 01.00 pm I get hundreds of messages in the syslogs like

09/27/2010 12:47:32.81 <Info:EAPS.DmnInfo> EAPSD DSK33 - Fail-timer-exp flag cleared. Domain state: Complete
09/27/2010 12:47:32.81 <Info:EAPS.DmnInfo> EAPSD DSK33 - Fail-timer-exp flag set. Domain state: Complete
09/27/2010 12:47:28.86 <Info:EAPS.DmnInfo> EAPSD DSK33 - Fail-timer-exp flag cleared. Domain state: Complete
09/27/2010 12:47:24.82 <Info:EAPS.DmnInfo> EAPSD DSK33 - Fail-timer-exp flag set. Domain state: Complete
09/27/2010 12:47:17.81 <Info:EAPS.DmnInfo> EAPSD DSK33 - Fail-timer-exp flag cleared. Domain state: Complete
09/27/2010 12:47:17.81 <Info:EAPS.DmnInfo> EAPSD DSK33 - Fail-timer-exp flag set. Domain state: Complete

There are nearly all EAPS-Master from 24 EAPS-RIngs affected. My investigations shows the following:

All the eaps-rings have no configuration errors.
There are no rx- or tx-errors on the uplinks.
Network performance is not affected.
It must be a certain vlan from which the interference comes, this is known.
It happens only on weekdays and during office hours
I have no details found with wireshark, but I'm not an expert with wireshark
I found some references about IPv6 neighbor discovery protocol, there are some W7 PCs in that vlan.
My guess is that something teases the cpu of the eaps-masters but how can I catch that thing?
Does anyone have a suggestion how I can get this under control with EW or XOS or wireshark or anything else?

Regards, mikeld
 
Mike,

It's possible that either the Master switches CPU is getting pegged by something or EAPS control packets are not being prioritized properly. If I had to guess your network is getting congested at the time.

From EXOS 12.4 Concepts Guide:

Note: Increasing the failtime value increases the time it takes to detect a ring break using the polling timers, but it can also reduce the possibility of incorrectly declaring a failure when the network is congested.

The default fail-timer is 3 seconds. You can cannot make it shorter, but you can increase it if your network experiences heavy congestion.

configure eaps <name> failtime <seconds> <milliseconds>

I would try bumping it up to 5 seconds. All that does is makes EAPS take longer to declare a failure.


A few questions:

1. Are you mixing EXOS and EWare based switches in this configuration? Yes or No

2. If Yes to #1, who is the Master? EXOS switch or Eware switch?

3. What is the switch model for the Master switch?

Check mainly on the Eware switches... The control VLAN on ExtremeWare switches should be configured for QoS Profile QP8 manually to give the control VLAN priority. This is only necessary on Eware switches, but it would not hurt to also configure on Exos switches as well.

Example, if you Control VLAN for a domain is Control4001

config vlan Control4001 qosprofile QP8

Again, only necessary on ExtremeWare switches, but I usually set this on both Eware and Exos. In EXOS the CPU is smart enough to prioritize Control packets over background data. I would check on Extremeware and set QoS Profile to QP8 on your control VLANs. Congestion plus not setting this could be the CRUX of your problem.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top