Understanding HA

snootalope · Mar 11, 2009

Ok, I'm not a full blown VMI3 shop, yet. My understanding of HA seems to change by the day.

My understanding of it right now, is this: If I have three servers running in a VM cluster, and one of the nodes fails, the vm's running on that physical node stay in a locked state until they are "powered back on" on another node. By "powered back on" I mean the OS running inside that vm never went down, only the vm did. So say one of the vm's that went down with the node was a terminal server. When the vm "powers back on" on a new node (HA at work), the terminal server is restored to exactly the state it was in when the vm powered off on the failing node. All the open applications and the state of those applications is right where they were when the vm powered off.

Is this right? If not, would someone PLEASE help me understand this as I just don't think I can read anymore whitepapers and blogs..

snootalope · Mar 11, 2009

Sorry, probably should have mentioned that all three esx nodes are tied to an iSCSI san that holds the VMDK's.

ArizonaGeek · Mar 11, 2009

You are sort of correct. HA, DRS and Vmotion all work hand in hand. This assumes you are running your VMs on a SAN with physical servers attached via iSCSI or Fiber Channel. The physical server must be able to access and see each of the LUNs of the SAN.

VMotion is the technology that allows the movement of a virtual machine from one physical machine to another. Basically, it doesn't actually move the VMDK files but more the pointer to those files on your SAN by the physical server.

DRS or Distributed Resource Scheduler, monitors your VMs for spikes in CPU, bandwidth or memory and will balance your resources for you. So, for instance, you have two virtual database servers located on one physical server that is being hit pretty hard, DRS can say "Hey this server is being slammed but I have one (or more) that isn't doing anything, I am going to move one of the DB servers to that machine while it is spiking." And it will move it.

Well, it has different levels from it will constantly adjust to only alerting you to the fact that your cluster is unbalanced and asks what you want it to do. Whatever you have set up in the rules.

HA or High Availability works in conjunction with DRS but it looks for flat out failures. So if you have physical server that dies (hardware, memory, power supply etc) and flat turns itself off, HA will make sure your VMs will still be available. Usually, your users will only notice a slight glitch as DRS and HA figure out where to bring up the VM. It would take anywhere from 30 seconds to 5 minutes depending on the rules you set up. To even waiting for your interaction. This is where DRS and HA will give you problems if not set up correctly. Lets say you have 2 physical servers and you have 2 domain controllers set up as VMs and a rule in DRS that says those two machines should never be on the same physical server. Well if you lose a physical server HA will not bring up that domain controller because of the rules set in DRS.

HA will only follow the rules set in DRS, so if a physical machine dies it will only bring up the machines it can based on those rules. I don't think ESX turns the virtual machines off unless it can't move the VM to another physical machine based on the rule. If nothing prevents the VM from starting on another physical machine it would bring up that VM in about a minute or two.

The short answer to your question is, yes when HA is activated the VM will come back up as it was previous to the failure. Your users will only notice a glitch of a minute or two while HA moves the pointer to the other machine and brings it online. The caveat to that is that it will come up based on the rules you set up in HA and DRS and the physical machine can handle the load.

I'm sure I gave you the War and Peace version of it but I think you'll kinda understand the answer is a bit more complicated than a yes or no. Or at least I hope that's what you got out of it.

Cheers
Rob

The answer is always "PEBKAC!

ArizonaGeek · Mar 11, 2009

As re-read my novel, I also forgot to bring up a point that HA can be set to ignore your DRS rules and bring up your VMs on a physical machine regardless of any constraints. Which then offers it's own pros and cons to that.

Cheers
Rob

The answer is always "PEBKAC!

nhidalgo · Mar 11, 2009

Let me add to what Rob said, Your vm's will boot up on another server if HA is set to monitor the heart beat of the vm's. They will act as if the power was pulled out of the machine like they were physical machines, the current state and any unsaved data will be lost. This also depends on the placement of your virtual center server. If that server is virutal and the host that server is on goes down, none of this will work. There is conflicting thoughts on making that server physical or virtual.

Nick

snootalope · Mar 12, 2009

Thank you both for chiming in. It all makes sense to me, yet there's a bit of conflict between your responses:

ArizonaGeek: "...the VM will come back up as it was previous to the failure"

nhidalgo: "...the current state and any unsaved data will be lost"

So, which is it? Assuming each node can see every LUN on the SAN and all networking is setup correctly and no rules block any VM's from running on the same node, will the VM's from the failed node reappear after a short time as if nothing happened, meaning no lost data and no restarting of services and applications?

ArizonaGeek · Mar 12, 2009

I am going on what I heard in my training class. To me as I understood it, the machine would come up exactly as it was when the physical went down. Of course its been a year and a half since I took the class and my memory isn't what it used to be.

I can't say that for a fact because I am not about to kill one of my physical servers (which are all running production) to try it out. Any volunteers wanna give it a try or anyone with actual experience? Am I wrong or is nhidalgo wrong or some where in the middle?

Cheers
Rob

The answer is always "PEBKAC!

nhidalgo · Mar 12, 2009

Here is a bit from the vmware site.

" VMware software makes possible rapid and automated restart and failover without the cost or complexity of solutions used with physical infrastructure. Virtual machines are hardware-independent and can share physical resources, thus failover can be implemented without requiring dedicated, identical standby hardware and the added complexity of maintaining identical configurations.

For server failures, VMware High Availability (HA)—a component of VMware Infrastructure 3—ensures rapid, automated restart of virtual machines. VMware HA automatically and intelligently restarts affected virtual machines on other production servers. As a part of virtual infrastructure, VMware HA can be easily configured for a server without dependencies on operating system, applications, or physical hardware."

HA will restart the vm server, just as if you restarted a physical. DRS will move it hot if you have VMotion licensed and enabled and both physical host are up and running.

Provogeek · Mar 18, 2009

The confusion I seem to see going on here is the running state of the Virtual Machine it's self.

When HA does it's job of moving a Virtual Machine, the state of the machine is "Powered Off" Therefore, a boot sequence is required of the Virtaul Machine when it gets moved to a new ESX host by HA in the event of a failure on a Virtual Machines original ESX host.

HA does not need vMotion or DRS to do it's job, but it will work with DRS if you are licensed for it. Do not confuse the way vMotion works with how HA works. vMotion is for migrating machines in a "Powered On" running state.

vMotion works by performing a copy of the VM memory on the source ESX server to the destination ESX server. Once the memory copy is completed, it will but a lock on VM running on the source ESX server, copy remaining memory blocks to the destination ESX server, release the VMDK on the source ESX, links the destination ESX server to the VMDK, and send out a gratuitis ARP to network switches to update their MAC address tables.

HA only powers machines on from an off state after moving the link to the VMDK.

=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
Brent Schmidt Senior Network Engineer
Keep IT Simple[/color red]

http://www.kiscc.com

Novell Platinum Partner Microsoft Gold Partner
VMWare Enterprise Partner Citrix Gold Partner

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Understanding HA

snootalope

IS-IT--Management

snootalope

IS-IT--Management

ArizonaGeek

IS-IT--Management

ArizonaGeek

IS-IT--Management

nhidalgo

MIS

snootalope

IS-IT--Management

ArizonaGeek

IS-IT--Management

nhidalgo

MIS

Provogeek

MIS

Similar threads

Part and Inventory Search

Sponsor