Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

S8800 Duplex Servers Automatically Interchanging

Status
Not open for further replies.

num025

Technical User
Sep 8, 2008
195
Hi guys, we have been experiencing constant automatic interchanges between a paired S8800 media server running on CM 5.2.1, below are the screenshots of the errors:

errorsn.jpg


restarts.jpg


We've also checked the system maintenance forms and automatic interchange is disabled there. So far we're not really sure what's triggering these servers to interchange so any suggestions would be helpful.
 
Thanks bsh, I'll try and get one of those.
 
Checked the servers and they are already at the latest Dec 2010 load of the patch (SP6 I think). Will try and downgrade it to SP5.
 
Still no luck. We even tried to rebuild everything from scratch but still the servers automatically interchange in the same duration.
 
Ugh - I'm scheduled to upgrade in 10 days to CM 5.2.1 with a paired S8800. Please keep us informed as to what you find out!

Susan
Muffins are just ugly cupcakes.
 
I've been on CM 5.2.1 SP 6 for a while now (Since 12/18) and have experienced 2 unexpected interchanges... seemingly once a month and always at roughly the same time. Almost like it's a maint process. Over all I've been very happy with how stable it is... except for this issue. I keep on thinking it's a fluke.... Likely unrelated to Num025's issue, but thought I'd tag on this post to see if any body else has the same related issue.

Num025 - check your speed and duplex settings on all interfaces... check the network side to see what it is set to and make sure they match.

Initialized 4 no Standby 12/28 01:16
Internal Request 1 no Standby 12/28 01:16
Internal Request 1 no Standby 12/28 01:17
Interchange 1 no Active 12/28 01:21
Initialized 4 no Standby 02/06 01:16
Internal Request 1 no Standby 02/06 01:17
Internal Request 1 no Standby 02/06 01:17
Interchange-Craft 4 yes Active 02/06 01:36

Wildcard
 
Judging by the PCD string in the Arbiter messages in the OP, it appears your servers are having trouble talking to some of your IPSIs. Do you have any remote port networks? Are you seeing pkt-int alarms for the same times?
 
Check the manufacture date of both servers, if they were manufactured before 11th May 2010 see PCN 1716B and PSN3170.

I had a S8800 ESS Cluster that would randomly interchange/reboot for no reason. Luckily it reset while I was standing in front of the server and I noticed on the Light Path Diagnostic Panel on the front of the server, that the NMI and PCI LED's were on.

Anyway we got the modification kit for the Dual-port Gigabit NIC, installed this and installed the S8800 firmware update and UEFI tool but this didn't help. The servers still continued to reset.

In the end I replaced both S8800's and made sure the manufacture date was after the 11th of May 2010. I updated the firmware and the UEFI tool on the replacement servers. Then I installed CM5.2.1 SP6.

So far so good, we've not had a server reset or interchange in over a month.

Funnily enough my main CM S8800 pair and AES S8800 server were also manufactured before the 11th May 2010 and I've not had a issue with these servers resetting!!
 
Wow, never thought a bunch of us has been experiencing these kinds of resets. From what I remember we do have manually set the speed and duplex on every local IPSI to 100Mbps/Full, however I'm not so sure if it's true with some remote IPSIs overseas... we'll have to verify it first.

The hardware manufacturing date glimma brought up is an interesting take on this issue, we'll have to verify that too to be sure.

In all, thanks a lot guys to the contributions! We do hope we could find a fix to this problem.

 
Susan, it looks like we are both going over to 5.2 around the same time. I cut to new 8800's and CM 5.2 service pack 6 Thursday night. If I find anything out about this before we cut, I'll post it for everyone.

When is the last time you helped someone, just because you were able to?

For the best response to a question, read faq690-6594


 
Apparently there were some IPSIs that were programmed but is not yet connected to the control network. After removing those the system became more stable, no more interchanges for the past day. We'll monitor these further but so far I guess it solved the issue.
 
Glad to hear things are looking better. FYI Avaya have finally acknowledged auto negotiate as a valid setting, so IPSI ports don't have to be hard coded to 100/full any more (just make sure the data switch port is configured the same as the IPSI).

For those worried about the NMI reset issue, check out the latest PCN:


Bottom line is that it can still happen on newer servers (even when the extra components to secure the board have been installed). Anyone preparing to go live with S8800s should definitely check that the board is secure and that moving/unplugging cables in eth3/4 doesn't cause a reboot. If this issue is present it can also mess up template installation where a dedicated NIC is configured (e.g. MBT with Media Services).
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top