This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Anyone know when DPA monitors/alerts on status of availability group health and replication status?

Hi,

I recently setup DPA in our environment and didnt see any monitors for availability groups and replication status. I wander why this important feature is not added to DPA? when can we expect?

I see there is lot of room for improvement compared to other vendor products more features need to be added to justify the price.

Thanks,

Sree

  • DPA offers a couple methods to monitor your AlwaysOn clustered servers.

    1. You can register your AlwaysOn Instance using the availability
      group listener.  DPA will monitor the cluster as a normal instance and can be configured to notify you when you have any failovers from primary to secondary by setting up a custom failover alert.
    2. For complete coverage, you have the option to register each physical node and group them using DPA’s instance display group option which allows you to see the overall response time balance of your nodes within your cluster environment

    Note: The benefit of option one is that it only requires one license where as option two requires a license for each node in your cluster. 

    There are plans to improve the registration and monitoring of an AlwaysOn environment as noted in our DPA Road Map

  • You also asked about the replication status and I'm sure had other notable metrics in mind as well.  In the comment above a custom alert was referenced that shows how DPAs open architecture can be used to extend it's monitoring capabilities to let you know when a failover occurred.  This same open architecture can be used to monitor the health of your AlwaysOn Availability Groups and replication statuses.  For example, by utilizing the information found in the sys.dm_hadr_database_replica_states table and using the formulas found here you can configure DPA to both collect and alert on replication failover time and data loss. 

  • Option 1 would be good if you are only using one AlwaysOn Availability Group per server but if you have multiple Availability Groups you would be better off licensing each server.

  • Okay, 11.0 is here, like the shiny new interface, and the added Always-On info is good. But what about alerting? We'd like to set up alerts for when Availability Groups fail over from one replica to another. An Alert Type, under Adminstrative, something like "Availability Group Status Change". I suspect I could write a custom alert based on a change to the [master].[sys].[dm_hadr_availability_group_states] table, but OW! Any plans for a built-in alert of this type?

  • Paul.  Thanks for the feedback.  Status and alerting are some of the planned features going forward, and you can see AG improvements are listed in the What We Are Working On for DPA (Updated April 7, 2017)

    I'll reach out to you to chat further, and make sure you are included in the next beta.

  • Here's the one I just made. Looks to test out OK.

    select count(*) from sys.dm_hadr_availability_group_states where synchronization_health != 2

    AG monitor.png

  • Thanks samr33​ , that's similar to what I've been toying with. We have some AO instances where we intentionally put some listeners on one node, some on the other. But I can join to sys.availability_group_listeners and select on the listener name to be more selective.

    The challenge I'm finding is to have it alert only when it changes. When an AG fails over, I don't want to get an Email every X minutes telling me it's failed over. Only when it happens. And ideally, when (if) it fails back. Only way I can think of to do that is to set up my own table of something like previous AG states, and only alert on when the actual sys.dm_hadr_availability_group_states data doesn't match the previous. And then each time, after checking, store the current in the previous.

  • Sure madisonpaul! I guess we had slightly different goals, we're AG for redundancy not for load, and so are agnostic about what node is active as long as the group is synchronized.

    Just my .02, but I would probably hard code the expected results of the query in the monitor rather than a results table. I use the "Notify when level not visited since normal" notification policy (bottom of the screenshot) which sends the alert notification just once on the first failure. I haven't played with "Notify when level changes and is not normal" but that might do what you're looking for.

  • Actually, we're doing AG for both redundancy and load. But we have a few apps that don't "play well" with AO, so a failover alert to let us know when to tend to those apps would be helpful. And thanks for the reminder about the "not visited since normal" idea; I forgot that was an option. Time to go play......