Does anyone here have experiences dealing with lane lockup issues on 8600 R or RS blades? This past week I experienced my second lane lockup issue on an 8634GTRS blade - affecting the same lane/group of ports that were affected earlier.
I had a case opened when the first lockup occurred, but learned that some of the prevention measures for lane lockup that had been developed in the 4.x code stream had not been incorporated into the 5.x stream - which, of course I have to run in order to use the RS blades in this chassis. Searching the knowledgebase finds lots of lane lockup issues. Of course, being a core switch in my network, recovering from this is not fun for anyone involved - this switch touches everything - voice, data, video.
Of course, this past week's issue happened when I was on vacation; I won't bore you with the gory details, but it wasn't pretty.
We've been promised a new release of 5.x in December that doesn't fix the problem (last I heard, NT STILL doesn't know what the cause is but it is supposed to do a graceful reset of the lane) but I can't really wait that long - I can't be available 24x7x365 til this crappy issue has been fixed.
I do believe I had at one time with 4.0.x code and R blades a lane lockup, but other than that, the 8600 has been a really stable switch, so this lockup (second in two months, on same switch, affecting same ports), is both scary and infuriating, turning the "ho-ho-ho" season into "ho-ho-hum".
The parter we use says he actually proved that he found a certain multicast packet passing through the 8600 switch could reproduce the lane lockup issue. Given that the lane [ports are divided 12 (4 GigE/8 SFP) 12 - 2 (10Gig) on the combo port, and the affected ones are in the left-hand ports) only has 3 ports active - and I know what is on them, has anyone come up with any cause/effect analysis? One item is a MS SQL DB server; another is an image processor server, and the third is an uplink to a 5520 for normal workstations, which have other uplinks, etc., which makes me discount this as a possible issue-causing source.
Some of the things I've been considering is moving these items to other ports; the caveat would be that possibly the problem moves along with this and affects other lanes, affecting other more critical services on the other 12 combo ports, or, heaven forbid, 24 ports on a 48-port Gig blade. I'm not sure turning these into MLT trunks or using adapter teaming to the servers will work, since I believe the 8600 (or the other end) still sees the MAC addresses of the switch ports, which would still make some things unavailable. Again, when this happens in the middle of a workday, the most important thing is getting business working again, not gathering diagnosis.
Note that I do not want to turn this into a "bash Nortel" session, but want to see if anyone has suggestions that we can all learn from in working out this issue.
I had a case opened when the first lockup occurred, but learned that some of the prevention measures for lane lockup that had been developed in the 4.x code stream had not been incorporated into the 5.x stream - which, of course I have to run in order to use the RS blades in this chassis. Searching the knowledgebase finds lots of lane lockup issues. Of course, being a core switch in my network, recovering from this is not fun for anyone involved - this switch touches everything - voice, data, video.
Of course, this past week's issue happened when I was on vacation; I won't bore you with the gory details, but it wasn't pretty.
We've been promised a new release of 5.x in December that doesn't fix the problem (last I heard, NT STILL doesn't know what the cause is but it is supposed to do a graceful reset of the lane) but I can't really wait that long - I can't be available 24x7x365 til this crappy issue has been fixed.
I do believe I had at one time with 4.0.x code and R blades a lane lockup, but other than that, the 8600 has been a really stable switch, so this lockup (second in two months, on same switch, affecting same ports), is both scary and infuriating, turning the "ho-ho-ho" season into "ho-ho-hum".
The parter we use says he actually proved that he found a certain multicast packet passing through the 8600 switch could reproduce the lane lockup issue. Given that the lane [ports are divided 12 (4 GigE/8 SFP) 12 - 2 (10Gig) on the combo port, and the affected ones are in the left-hand ports) only has 3 ports active - and I know what is on them, has anyone come up with any cause/effect analysis? One item is a MS SQL DB server; another is an image processor server, and the third is an uplink to a 5520 for normal workstations, which have other uplinks, etc., which makes me discount this as a possible issue-causing source.
Some of the things I've been considering is moving these items to other ports; the caveat would be that possibly the problem moves along with this and affects other lanes, affecting other more critical services on the other 12 combo ports, or, heaven forbid, 24 ports on a 48-port Gig blade. I'm not sure turning these into MLT trunks or using adapter teaming to the servers will work, since I believe the 8600 (or the other end) still sees the MAC addresses of the switch ports, which would still make some things unavailable. Again, when this happens in the middle of a workday, the most important thing is getting business working again, not gathering diagnosis.
Note that I do not want to turn this into a "bash Nortel" session, but want to see if anyone has suggestions that we can all learn from in working out this issue.