I recently setup DPA in our environment and didnt see any monitors for availability groups and replication status. I wander why this important feature is not added to DPA? when can we expect?
I see there is lot of room for improvement compared to other vendor products more features need to be added to justify the price.
DPA offers a couple methods to monitor your AlwaysOn clustered servers.
Note: The benefit of option one is that it only requires one license where as option two requires a license for each node in your cluster.
There are plans to improve the registration and monitoring of an AlwaysOn environment as noted in our DPA Road Map.
Okay, 11.0 is here, like the shiny new interface, and the added Always-On info is good. But what about alerting? We'd like to set up alerts for when Availability Groups fail over from one replica to another. An Alert Type, under Adminstrative, something like "Availability Group Status Change". I suspect I could write a custom alert based on a change to the [master].[sys].[dm_hadr_availability_group_states] table, but OW! Any plans for a built-in alert of this type?
Thanks samr33 , that's similar to what I've been toying with. We have some AO instances where we intentionally put some listeners on one node, some on the other. But I can join to sys.availability_group_listeners and select on the listener name to be more selective.
The challenge I'm finding is to have it alert only when it changes. When an AG fails over, I don't want to get an Email every X minutes telling me it's failed over. Only when it happens. And ideally, when (if) it fails back. Only way I can think of to do that is to set up my own table of something like previous AG states, and only alert on when the actual sys.dm_hadr_availability_group_states data doesn't match the previous. And then each time, after checking, store the current in the previous.
Sure madisonpaul! I guess we had slightly different goals, we're AG for redundancy not for load, and so are agnostic about what node is active as long as the group is synchronized.
Just my .02, but I would probably hard code the expected results of the query in the monitor rather than a results table. I use the "Notify when level not visited since normal" notification policy (bottom of the screenshot) which sends the alert notification just once on the first failure. I haven't played with "Notify when level changes and is not normal" but that might do what you're looking for.
Actually, we're doing AG for both redundancy and load. But we have a few apps that don't "play well" with AO, so a failover alert to let us know when to tend to those apps would be helpful. And thanks for the reminder about the "not visited since normal" idea; I forgot that was an option. Time to go play......
Paul. Thanks for the feedback. Status and alerting are some of the planned features going forward, and you can see AG improvements are listed in the What We Are Working On for DPA (Updated April 7, 2017)
I'll reach out to you to chat further, and make sure you are included in the next beta.
Option 1 would be good if you are only using one AlwaysOn Availability Group per server but if you have multiple Availability Groups you would be better off licensing each server.
You also asked about the replication status and I'm sure had other notable metrics in mind as well. In the comment above a custom alert was referenced that shows how DPAs open architecture can be used to extend it's monitoring capabilities to let you know when a failover occurred. This same open architecture can be used to monitor the health of your AlwaysOn Availability Groups and replication statuses. For example, by utilizing the information found in the sys.dm_hadr_database_replica_states table and using the formulas found here you can configure DPA to both collect and alert on replication failover time and data loss.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.