Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations strongm on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

CS1000 HA over HSP and estimated downtime.

Status
Not open for further replies.

optman10

Technical User
Sep 10, 2013
35
MA
Hi all

I have configure HA CS1000 over HSP link, MGC 0 0 with core0 and MGC0 1 with core 1, all work good, during the tests when we reboot manually the core0 or remove its ELAN câble, the core1 can take the control only after 2 or 4 minutes so all MGC reboot also (is it normal ?‚)
The second test is when we power off the entire gateway 0 0 (that contain core 0‚) we got the same result.
When Core0 comes up again it didn't take the control automatically so it become standby, and core1 stay active, is it possible to switch automatically from core1 to core0 without manual intervention.

Does the HSP maintain the redundancy for 99,99 % without downtime ?

Please I have some confusions, I wait your recomandations and your ideas.

Thanks & best regards
 
I think 2 to 4 minutes is pretty realistic considering what all is going on. When Core 0 goes out of Service then obviously Core 1 takes over. When Core 0 comes back on line Core 1 will still be active and stay that way until it is manually switched or switched during the midnight routine. Yes, the HSP maintains redundancy. As far as 99%, I'm not sure on that. I'm not sure the MGC's rebooting is normal though. I don't think they should actually reboot unless there is an Alternate Call Server configured. With an HA system there really should be no need for an Alternate Call Server but it can be done with Geographic Redundancy.
 
Thanks for you reply, I got it now:

here's a description of the switchover states:

Graceful Switchover
In normal operation the health count of each CPU should be equivalent. In the case where the active CPU detects that the redundant CPU has better health, a graceful switchover is invoked. In this process, almost the entire memory image from the active CPU is copied over to the memory of the redundant CPU. The redundant CPU resumes the operations left off from the active CPU after going through a post-switchover procedure. This post-switchover procedure includes sending out a gratuitous ARP message to the IP world for informing where the active IP ELAN address is located. This CPU becomes the active side.
The previously active side invokes a warm start after the copying operation is completed. After the warm start, it becomes the redundant side.
During a graceful switchover, there is usually no impact to calls already in progress. There is a brief duration whereby new calls are not allowed in the neighborhood of 6-8 seconds depending upon the configuration.
Graceful switchover may be invoked manually using the SCPU command in overlay 135.

Ungraceful Switchover
When it is decided that the active side is inoperable (e.g. power or processor failure, watchdog timeout, exceptions), the **redundant side warm starts** and takes over control. The switchover does not occur immediately, because when the redundant side detects loss of heartbeat, it must wait long enough to be sure that the active side is not simply performing a warm start (INI). The timer used to invoke the ungraceful switchover is in the order of 56 seconds.

Heartbeat
The two CPUs exchange heartbeats to determine if the other CPU is reachable over the HSP. The heartbeat protocol also carries information regarding the health count of each CPU. If the HSP is disconnected then the heartbeat protocol attempts to traverse the ELAN instead
If the heartbeat cannot be communicated between the two CPUs meaning that connection over the HSP and ELAN is lost between the two CPUs then the redundant CPU warm starts to become active after a certain period of time.
By optimizing timeout and threshold parameters used in retries of the heartbeat mechanism, ungraceful switchover trigger time is reduced to less than 15 seconds. The optimization in the timing leads to a change in the INI policy. When the active core warm starts, the inactive core also reboots, so no swapping of the cores takes place.

So by unplugging the ELAN, the health changed, and you got a "graceful" switchover - 6-8 seconds. The heartbeat was still being carried over the HSP. When you powered down the active call server, you got a ungraceful switchover which takes up to 56 seconds (so the docs say) and also invokes an INI on the offline side. I'd guess if it sysloaded also, then something was wrong with the offline side. Make sure both CPU's are patched (patch it in redundant mode) and then test again; make sure you can boot off both cores and run off both cores w/o error.

This is all from the System Redundancy NTP's in the Campus Redundancy section - the description is the same as non-campus redundant HA configuration.


I fixed also the issue of MGC rebooting by adding redundant ELAN/TLAN connections to MGC (dual-homing‚‚)

Thnaks & regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top