I have a 3 node DAG (EX2016, current patchlevel), 2 in "main" AD site and 1 in backup site. It's about 400 users homed in 9 mailbox databases currently. Since a few days on of the nodes in the main AD site doesn't keep db mounts anymore.
So, when i e.g. do a manual switchover, the db is mounted properly for some seconds. Then a store transient error occurs, results in a disablement of "readfrompassive" feature, dismounts the db again and failing over to node 1 again or node 3 - depending the load.
The "error structure" respectively the "process" happening when switching active copies to this server is always the same. I can reproduce it with every database on this server.
[ol 1]
[li]Manual or automatic switchover to "node2" occurs, starting with Event 2090 informing about.[/li]
[li]Followed by 5 ESE informational events telling the log reply and db attaching.[/li]
[li]Then "MSExchangeIS" reports that the feature component "readfrompassive" (1061) has been re-enabled.[/li]
[li]Then the first error occurs "1001 - MSExchangeIS - MSEXCH info store encountered an internal logic error"[/li]
[li]Followed by an unhandled exception "1002 - processid perf counter (0) does not match actual process id.."[/li]
[li]Followed by a watson report error (4999) of the M.E.Store.Worker[/li]
[li]Only then the next error provides a bit (!) more information: "489 ESE - attempt to open edb-file for read only access failed with system error 32 (used by another process)"[/li]
[li]This is then followed by some Windows Error Reporting events (informational, 1001), some informational MSExchangeIS events (1021,40008,40036) ...[/li]
[li]Followed by another bunch of ESE events telling about log replays of the db about to be mounted ... ??[/li]
[li]Funny enough ...: Only then "MSExchangeIS - 40008" reports that the db has been mounted successfully.[/li]
[li]Stopping worker process (1021)[/li]
[li]Starting MapiAddressBookAppPool (2000,2001)[/li]
[li]"MSExchange Mid-Tier Storage" informs about "PICW Core feature not enabled" (11000)[/li]
[li]Now another 4999 error is logged about "MSExchangeDelivery, M.ExchangeStoreProvider, M.E.M.S.DeliveryItem..." crashing > stacktrace result "MS.Mapi.CrossServerConnectionPolicy.Apply"[/li]
[li]Followed by 2 Windows Error Reporting informational events[/li]
[li]And finally followed by a ExchangeStoreDB event (126) informing about that a db copy on this server caused an error which resulted in dismounts. The reason should be looked up in previous events (see above ^^)[/li]
[li]Referring this it looks like that the server is trying to mount a db 3 times by default .... ? Every time it results in more or less same error events.[/li]
[/ol]
What i have already tried:
[ul]
[li]purge one of the passive copies on problematic node, delete related files and recreate the passive copy again newly[/li]
[li]-----Try to reproduce: Still failing as described above.[/li]
[li]Review file/folder permissions on db and log storage locations (separated disks/volumes).[/li]
[li]-----Actually there was an issue with partially incorrect permissions -> corrected all ownership back to administrators group and re-applied default permissions[/li]
[li]----------Unfortunately, this had also no effect at all - still same failing while activating db copy on this node[/li]
[li]Sending server to maintenance mode and shut down, check dbs with eseutil: all in dirty shutdown state (but still mounting even though only for a short time??)[/li]
[li]Read tons of kb articles and forum discussions related to such and similar behavior - no further insights unfortunately.[/li]
[/ul]
Any help, ideas, tipps & tricks are highly appreciated.
So, when i e.g. do a manual switchover, the db is mounted properly for some seconds. Then a store transient error occurs, results in a disablement of "readfrompassive" feature, dismounts the db again and failing over to node 1 again or node 3 - depending the load.
The "error structure" respectively the "process" happening when switching active copies to this server is always the same. I can reproduce it with every database on this server.
[ol 1]
[li]Manual or automatic switchover to "node2" occurs, starting with Event 2090 informing about.[/li]
[li]Followed by 5 ESE informational events telling the log reply and db attaching.[/li]
[li]Then "MSExchangeIS" reports that the feature component "readfrompassive" (1061) has been re-enabled.[/li]
[li]Then the first error occurs "1001 - MSExchangeIS - MSEXCH info store encountered an internal logic error"[/li]
[li]Followed by an unhandled exception "1002 - processid perf counter (0) does not match actual process id.."[/li]
[li]Followed by a watson report error (4999) of the M.E.Store.Worker[/li]
[li]Only then the next error provides a bit (!) more information: "489 ESE - attempt to open edb-file for read only access failed with system error 32 (used by another process)"[/li]
[li]This is then followed by some Windows Error Reporting events (informational, 1001), some informational MSExchangeIS events (1021,40008,40036) ...[/li]
[li]Followed by another bunch of ESE events telling about log replays of the db about to be mounted ... ??[/li]
[li]Funny enough ...: Only then "MSExchangeIS - 40008" reports that the db has been mounted successfully.[/li]
[li]Stopping worker process (1021)[/li]
[li]Starting MapiAddressBookAppPool (2000,2001)[/li]
[li]"MSExchange Mid-Tier Storage" informs about "PICW Core feature not enabled" (11000)[/li]
[li]Now another 4999 error is logged about "MSExchangeDelivery, M.ExchangeStoreProvider, M.E.M.S.DeliveryItem..." crashing > stacktrace result "MS.Mapi.CrossServerConnectionPolicy.Apply"[/li]
[li]Followed by 2 Windows Error Reporting informational events[/li]
[li]And finally followed by a ExchangeStoreDB event (126) informing about that a db copy on this server caused an error which resulted in dismounts. The reason should be looked up in previous events (see above ^^)[/li]
[li]Referring this it looks like that the server is trying to mount a db 3 times by default .... ? Every time it results in more or less same error events.[/li]
[/ol]
What i have already tried:
[ul]
[li]purge one of the passive copies on problematic node, delete related files and recreate the passive copy again newly[/li]
[li]-----Try to reproduce: Still failing as described above.[/li]
[li]Review file/folder permissions on db and log storage locations (separated disks/volumes).[/li]
[li]-----Actually there was an issue with partially incorrect permissions -> corrected all ownership back to administrators group and re-applied default permissions[/li]
[li]----------Unfortunately, this had also no effect at all - still same failing while activating db copy on this node[/li]
[li]Sending server to maintenance mode and shut down, check dbs with eseutil: all in dirty shutdown state (but still mounting even though only for a short time??)[/li]
[li]Read tons of kb articles and forum discussions related to such and similar behavior - no further insights unfortunately.[/li]
[/ul]
Any help, ideas, tipps & tricks are highly appreciated.