Hello Everyone,
We have an issue where a SAM component went critical (Verified on the logs/events) however the Alert that is connected to the SAM Monitor did not fire at all. I've called support and they told me that the solution is re-create the alert which solved the issue. However, there's no way to audit our entire system to catch/identify these corruption. It is very dangerous for us to have monitors created and placed then the next thing you know, these conditions trigger and we have no way of knowing besides another painstaking rediscovery of a faulty application.
Customer Support has directed me to post in thwack as a restort.
The solution would be creating a query that would pull out all application monitor name that has anything that doesn't have an "up" status in a timeframe, then another query to pull out all alerts that has fired with the same name of the application monitor name.
Whichever application monitor that doesn't have any matches with an alert would definitely be a start of an audit.
Any guidance would be greatly appreciated. Thanks!