Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations SkipVought on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Exchange DAG failover issues

Status
Not open for further replies.

davy2k

Technical User
Mar 18, 2007
69
0
0
JP
Dear All,
We have installed Exchange 2010 DAG with the following architecture:

AD and Exchange 2010 in Site A 172.16.1.0/24

AD and Exchange 2010 in Site B 172.16.101.0/24

172.16.1.0/24 and 172.16.101.0/24 subnets are for MAPI

192.168.1.0/24 and 192.168.101.0/24 are for replication

NIC 1 is connected via sonic wall site to site vpn for MAPI
NIC 2 is connected using vyatta site to site vpn for replication

Everything was working well until the exchange 2010 in the main site auto reboot after an update. The issue we are having is that when we failed back to the

main site after the exchange 2010 in site A came back oniline it automatically switches back to Site B. User's outlook shows that it is connected to site B

exchange server.

We did a validation check and got the following errors and the other errors are from the Cluster event viewer.

Your help is highly appreciated.

Thank you in advance.


Validation errors

Validate Resource Status
Validate that cluster resources are online, and list the cluster resources that are running in separate resource monitors.
Validating cluster resource Name: DAG.
Validating cluster resource File Share Witness (\\Node2.abc.com\DAG.abc.com) (\\Node2.abc.com\DAG.abc.com).
Validating cluster resource IP Address: 172.16.101.12.
This resource does not have all the nodes of the cluster listed as 'Possible Owners'. The group that this resource is a member of will not be able to come

online on any node that is not listed as a 'Possible Owner'.
Validating cluster resource IP Address: 172.16.1.12.
This resource is marked with a state of 'Offline'. The functionality that this resource provides is not available while it is in the offline state. The

resource may be put in this state by an administrator or program. It may also be a newly created resource which has not been put in the online state or the

resource may be dependent on a resource that is not online. Resources can be brought online by choosing the 'Bring this resource online' action in Failover

Cluster Manager.




Validate Software Update Levels
Validate that all tested servers have the same software updates installed.
Validating that all servers have the same software updates...


Software Updates missing on 'Node2.abc.com':
All software updates present


Software Updates missing on 'Node1.abc.com':
Hotfix Id Description
KB2518295 Security Update



The servers do not all have the same software updates.


Cluster Events errors
Cluster node 'Node1' was removed from the active failover cluster membership. The Cluster service on this node may have stopped. This could also be due to

the node having lost communication with other active nodes in the failover cluster. Run the Validate a Configuration wizard to check your network

configuration. If the condition persists, check for hardware or software errors related to the network adapters on this node. Also check for failures in any

other network components to which the node is connected such as hubs, switches, or bridges.
 
Which server is node1.abc.com and which server is node2.abc.com?
If node2 is Server A, then I am leaning towards you removing
Hotfix Id Description KB2518295 Security Update from it.

The other node is missing this one, so i need to know if the node missing it is Server B. Since server B is running fine, then I would lean towards my previous statement of removing that update from server A.

If the nodes aren't what I am assuming, then I will revisit this question.

_______________________________________
Great knowledge can be obtained by mastering the Google algorithm.
 
Thank you TechyMcSe2k for your response.

Node2 is ServerB. Also, I discovered that this issue started occuring after installing Symantec backup exec system recovery with VSS enabled. What do you think?

We have disabled VSS but have not tested the failover yet.



 
Are you running Exchange 2010 SP1? As we found this improved the behaviour of DAGs and failover.

-------------------------------

If it doesn't leak oil it must be empty!!
 
Agree with NortonES2, add SP1 and Rollup 5 that is currently out and see if that resolves your issue.

_______________________________________
Great knowledge can be obtained by mastering the Google algorithm.
 
Thank you for your response NortonES2 and TechyMcSe2k below is the version of exchange currently installed on the Node1 and Node2 exchange servers.

Which shows that SP1 is what is currently running on both servers

We are planning to failback to the primary site this night to determine if Symantec VSS is the culprit(since we got approval from management.

I will keep you guys posted and thank you once again.

[PS] C:\Windows\system32>Get-ExchangeServer | Format-Table Name, *Version*

Name AdminDisplayVersion ExchangeVersion
------------------- ---------------
Node1 Version 14.1 (Build 218.15) 0.1 (8.0.535.0)
Node2 Version 14.1 (Build 218.15) 0.1 (8.0.535.0)
 
Dear all,
When a backup is in progress the Node1 freezes and I get the error below...

We would like to use Backup Exec 2010 R3 for backups has anyone used this for backing up exchange in a DAG implementation?

"Incremental seeding of database Mailbox Database DB1\Node1 encountered a transient error. The database copy status will be set to Failed, and the operation will be retried. Error: Incremental reseed failed, but it can be retried later by resuming replication. Error: An error occurred while processing a request on server 'Node2'. Error: Couldn't open backup file handle for database 'Mailbox Database TKY01' to server 'Node2'. Hresult: 0x50d. Error: A database backup is already in progress. Please verify that no other seeding or incremental reseeding operations are started for this database, and then try the operation again by rerunning the Update-MailboxDatabaseCopy cmdlet."
 
Are you trying to backup the passive copy only? Have you made this setting change to backup the passive copy only in Preferred Server settings?


NOTE from BeAdmin_en.pdf Page 1138:
Back up from the passive copy only, using Preferred
Server settings if possible (job fails if not available)
Lets you back up a passive copy of the database. If
Backup Exec cannot access the passive copy, the job fails.
In this case, neither the active nor the passive database
is backed up. Select this option when you do not want to
affect the performance of the active copy of the database.
For Exchange Server 2010, Backup Exec selects the
passive copy based on your selections in the Preferred
Server settings.
Note: You must have the preferred server settings
configured to use this option.
See “About preferred server configurations”

Also, there is a Symantec Backup Exec Forum that might help you out better on this question:

_______________________________________
Great knowledge can be obtained by mastering the Google algorithm.
 
Dear All,
Thank you for your response.

I disabled the Symantec system recovery software and failed back to the primary site and it failed back. However, the DB shows mounted on the primary site and "Disconnected and Resynchronizing" on the secondary site. I resolved this by enabling replication on the MAPI network and the DB became healthy immediately. I'm guessing that the issue might have been the Replication network connectivity.

My issue now is that I'm wondering why outlook is still pointing to the secondary server while the DB is active on the primary server..

I assume that outlook will automatically point to the server where the DB is active. Please advice

Thank you for your envisaged response.
 
I also noticed that in the Failover cluster manager it shows that the current host server is the secondary server.

Do I have to create an alternative witness server? The current witness server is on the AD on the secondary site.
 
Outlook will look at your CAS server. It wont matter what server your database is active on.


Understanding Database Availability Groups

Scroll down to: Microsoft Outlook Behavior and Logic

_______________________________________
Great knowledge can be obtained by mastering the Google algorithm.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top