Exchange 2010 Database Availability Group

Version 16

    The components of this template track the statistics of an Exchange 2010 Database Availability Group (DAG) using PowerShell scripts.


    Prerequisites:

    • Exchange Management Tools must be installed on target Exchange server.
    • Windows Authentication should be enabled for PowerShell on the Exchange server. This can be configured in IIS mmc

      Start > Administrative Tools > Internet Information Services (IIS) Manager.


    1. In the IIS console, expand Your Server, Sites, Default Web Site. Select PowerShell application. On the central panel, open Authentication
    2. Select Windows Authentication and Enable it from the right panel

     

    Credentials: The credentials must be that of an Exchange Administrator account (Organization Manager) with at least view-only permissions. Credentials should be provided with the domain part in the login field – domain\user.

    Note: If you have trouble with template functionality, refer to the troubleshooting section.


    Monitored Components

    Status of database copies

    This component monitor returns the status of DAG database copies for up to 10 DAG members.

    Possible values:
    0 – Database is Healthy;
    1 – Database is Mounted;
    2 – Database is Dismounted (Warning threshold);
    3 and higher – Another database's status (Critical threshold).

    Note: You must specify the correct name of your Exchange server and Exchange DAG Database in the Script Arguments field of the corresponding PowerShell Monitor. If you fail to do this, the counter will return with an error of "Undefined" status.

    Arguments:

    Exchange_DAG_Database

    Exchange_DAG_Database - the name Exchange DAG Database.

    Arguments example:
    Mailbox Database 123

    To see the names of your Exchange Databases, run the following command in the Exchange Management Shell: Get-MailboxDatabase

     

    DAG Health 1

    This component monitor checks all aspects of the replication and replay status to provide a complete overview of a specific mailbox server in a Database Availability Group (DAG).

    Returned values:

    -1 – The test is unavailable on target nodes. Some tests might be not available depending upon the Database Availability Group configuration and test results.
    0 – The test passed successfully.
    1 – The test failed and returns an error message.

    This component returns the status of the following services:

    ClusterService – This component verifies that the Cluster service is running and reachable on the specified DAG member.
    ReplayService – This component verifies that the Microsoft Exchange Replication service is running and reachable on the specified DAG member.
    ActiveManager
    – This component verifies that the instance of Active Manager running on the specified DAG member is in a valid role (primary, secondary, or stand-alone).
    TasksRpcListener
    – This component verifies that the task's Remote Procedure Call (RPC) server is running and reachable on the specified DAG member.
    TcpListener
    – This component verifies that the TCP log copy listener is running and reachable on the specified DAG member.
    DagMembersUp
    – This component verifies that all DAG members are available, running, and reachable.
    ClusterNetwork
    – This component verifies that all cluster-managed networks on the specified DAG member are available.
    QuorumGroup
    – This component verifies that the default cluster group (quorum group) is in a healthy and online state.
    FileShareQuorum
    – This component verifies that the witness server and witness directory and share configured for the DAG are reachable.

     

    DAG Health 2

    This component monitor checks all aspects of the replication and replay status to provide a complete overview of a specific Mailbox server in a Database Availability Group (DAG).

    Returned values:

    -1 – The test is unavailable on target nodes. Some tests might be not available depending upon the Database Availability Group configuration and test results.

    0 – The test passed successfully.

    1 – The test failed and returns an error message.

    This component returns the status of the following services:

    DBCopySuspended – This component checks whether any mailbox database copies are in a suspended state on the specified DAG member.

    DBCopyFailed – This component checks whether any mailbox database copies are in a failed state on the specified DAG member.

    DBInitializing – This component checks whether any mailbox database copies are in an Initializing state on the specified DAG member.

    DBDisconnected – This component checks whether any mailbox database copies are in a disconnected state on the specified DAG member.

    DBLogCopyKeepingUp – This component verifies that log copying and inspection by the passive copies of databases on the specified DAG member are able to keep up with log generation activity on the active copy.

    DBLogReplayKeepingUp – This component verifies that replay activity for the passive copies of databases on the specified DAG member is able to keep up with log copying and inspection activity.

    Note: If the target DAG member does not have any database copies, this component will return “-1” values.

     

    Service: Microsoft Exchange Replication

    This monitor returns CPU and memory usage from the Microsoft Exchange Replication service. The Microsoft Exchange Replication service provides replication functionality for mailbox databases on Mailbox servers in a Database Availability Group.

     

    Windows Cluster Network Errors

    This monitor returns the number of events that occur when:

    • A specific cluster network interface for a specific cluster node on a specific network is unreachable by at least one other cluster node attached to the network. The failover cluster was not able to determine the location of the failure (1126).
    • A specific cluster network interface for a  specific cluster node on specific network is failed (1127).
    • A specific cluster network is down. None of the available nodes can communicate using this network (1130).
    • A specific cluster network is partitioned. Some attached failover cluster nodes cannot communicate with each other over the network (1129).
    • A specific cluster node was removed from the active failover cluster membership (1135).

    Events: 1126,1127,1130,1129,1135.

    Source: Microsoft-Windows-Failover Clustering.

    Run the Validate a Configuration wizard to check your network configuration. If the condition persists, check for hardware or software errors related to the network adapter. Also check for failures in any other network components to which the node is connected such as hubs, switches, or bridges.

     

    Windows Cluster Configuration Errors

    This monitor returns the number of events that occur when:

    • A specific cluster IP address resource failed to come online (1360).
    • Attempting to use IPv4 for specific network adapter failed (1555).
    • A specific network which has been disabled for failover cluster use was found to be the only currently possible network that node can use to communicate with other nodes in the cluster (1569).

    Events: 1360,1555,1569.

    Source: Microsoft-Windows-FailoverClustering.

    You should review all IP address configurations. Also run the Validate a Configuration wizard to check your network configuration. Also, confirm that at least one network is configured for use by the cluster.

     

    Exchange Replication Service Errors

    This monitor returns the number of events that occur when:

    • The Microsoft Exchange Replication service failed to start the HTTP listener (2120).
    • The Microsoft Exchange Replication service failed to start the Active Manager RPC server (3175).
    • The Microsoft Exchange Replication service failed to create a temporary log file (2055).
    • The Microsoft Exchange Replication service failed to clean up files for specific database (4109).
    • The Microsoft Exchange Replication service failed to start the Tasks RPC server (2135).

    Events: 2120,3175,2055,4109,2135.

    Source: MSExchangeRepl.

    For all these events, review the Application log and System log on your Exchange 2010 servers for related events. The following events occur:

    2120 - This Warning event occurs if the log copier listener does not start when the Microsoft Exchange Replication service is started.

    3175 - This Warning event occurs if the Microsoft Exchange Replication service cannot enable the Active Manager RPC listener when the computer is started.

    2055 - You receive this Error event if a hard I/O error occurs in NTFS that prevents the creation of a file as part of a database failover.

    4109 - This Warning event indicates that the Database Copy process failed to remove continuous replication files for a database copy.

    2135 - This Warning event occurs if the Microsoft Exchange Replication service cannot enable the Tasks RPC listener when the computer starts.

     

    Health Check Failed: Cluster Service

    This Warning event occurs when the server cannot become an active member of its database availability group (DAG).

    Event: 4038.

    Source: MSExchangeRepl.

    Review the Application log and System log on your Exchange 2010 servers for related events.

     

    Health Check Failed: Active Manager

    This Warning event occurs if the Microsoft Exchange Replication service cannot enable the Active Manager RPC listener.

    Event: 4040.

    Source: MSExchangeRepl.

    Review the Application log and System log on your Exchange 2010 servers for related events.

     

    Health Check Failed: DAG Members

    This event indicates that the server is a member of a database availability group. This event also indicates that other servers that are members of the same database availability group are currently not operational or cannot be contacted.

    Event: 4044.

    Source: MSExchangeRepl.

    You must make sure that the servers that are within in the same database availability group are operational. Additionally, verify network connectivity on all servers.

     

    Health Check Failed: Quorum Group

    This warning occurs if problems are detected that may cause the database availability group to eventually fail because the security information is getting too old.

    Event: 4051.

    Source: MSExchangeRepl.

    Review the Application log and System log on your Exchange 2010 servers for related events.

     

    Health Check Failed: DAG Network

    This warning occurs if one or more networks that support the database availability group are not operating correctly on the server.

    Event: 4046.

    Source: MSExchangeRepl.

    Review the Application log and System log on your Exchange 2010 servers for related events.

     

    Health Check Failed: File Share Quorum

    This warning occurs if problems are detected that may cause the database availability group to eventually fail because the security information is becoming too old.

    Event: 4049.

    Source: MSExchangeRepl.

    Review the Application log and System log on your Exchange 2010 servers for related events.

     

    Health Check Failed: Tasks RPC Listener

    This Warning event occurs if the Exchange Replication service cannot enable the Tasks RPC listener.

    Event: 4053.

    Source: MSExchangeRepl.

    Review the Application log and System log on your Exchange 2010 servers for related events.

     

    Health Check Failed: Http Listener

    This Warning event occurs if the Exchange Replication service cannot enable the log copier listener.

    Event: 4055.

    Source: MSExchangeRepl.

    Review the Application log and System log on your Exchange 2010 servers for related events.

     

    Health Check Failed: Database Redundancy

    This error occurs if Exchange finds that a replicated database doesn’t have sufficient healthy copies.

    Event: 4113

    Source: MSExchangeRepl.

    The Replication Service will continue to check and log event 4113 for the database every twenty minutes until it goes into a state where sufficient redundancies exist to accommodate an outage and allow the failed copy of the database to failover to a healthy copy. You can modify the redundancy count by adjusting the  AtLeastNCopies parameter in "C:\Program Files\Microsoft\Exchange Server\V14\Scripts\CheckDatabaseRedundancy.ps1".

     

    Database Mount Error due to Number of Lost Logs

    This Error event indicates that the Microsoft Exchange Replication service did not activate the best available database copy because the attempted failover did not succeed. In this case, the local copy of the database did not mount successfully.

    Event: 2092.

    Source: MSExchangeRepl.

    Review the Application log and System log on your Exchange 2010 servers for related events.

     

    Troubleshooting

    If you have a returned error similar to the following:

    Message: ERROR: Please check target server argument and credentials (should be domain\user). [192.168.1.206] Connecting to remote server failed with the following error message : Access is denied.

    Resolution: This error could occur when you use the wrong credentials. Check the credentials and verify the credentials are in the following format: (domain\user). The user should be Exchange Organization Manager.


    If you have a returned error similar to the following:

    ERROR: The operation couldn't be performed because object 'Mailbox Database 10580933221\*' couldn't be found on 'xchng2010.apmteam.sw'.

    Resolution: Provide the correct database name.


    If you have a returned error similar to the following:

    [192.168.1.206] Connecting to remote server failed with the following error message : The WinRM client cannot process the request. The WinRM client tried to use Negotiate authentication mechanism, but the destination computer (192.168.1.206:443) returned an 'access denied' error. Change the configuration to allow Negotiate authentication mechanism to be used or specify one of the authentication mechanisms supported by the server. To use Kerberos, specify the local computer name as the remote destination. Also verify that the client computer and the destination computer are joined to a domain. To use Basic, specify the local computer name as the remote destination, specify Basic authentication and provide user name and password.

    Resolution: This error indicates that Windows Authentication is not enabled for the PowerShell application on IIS on the Exchange server.


    If you have a returned error similar to the following:

    [192.168.1.206] Connecting to remote server failed with the following error message : The WinRM client received an HTTP status code of 403 from the remote WS-Management service.

    Resolution: If you get this error, you should check your SSL settings for the PowerShell application in IIS on the Exchange server.

    You should use one of the following configurations:
    - Require SSL unchecked;
    - Require SSL checked and Client Certificates is set to Accept;
    - Require SSL checked and Client Certificates is set to Ignore;

     

    If you have a returned error similar to the following:

    Message: ERROR: Please check target server argument and credentials (should be domain\user). [xchng2010] Connecting to remote server failed with the following error message : The WS-Management service cannot process the request. This user allowed a maximum number of 5 concurrent shells, which has been exceeded. Close existing shells or raise the quota for this user.

    Resolution: This error could occur when you use more than five remote PowerShell sessions (set by default) at the same time. If you get this error, it is recommended that you increase the number of concurrent shells on the Exchange server. Open a windows Command Line as Administrator and run the following command:
    winrm set winrm/config/winrs @{MaxShellsPerUser="30"}