This might be a long post and I apologize in advance. It requires some background on how we implemented alerts and discovered an odd behavior with Orion resetting the alert during database maintenance and immediately re-triggering.
Our Volume alerts are fairly standard. Alert if volume percent used is greater than threshold.
We decided to use 'Orion General Thresholds' to set our standard.
URL: /Orion/NetPerfMon/Admin/NetPerfMonSettings.aspx
This gives us the ability to set a standard that all volumes get, and allows us to 'override' the threshold on a volume by volume basis.
This all worked out fairly easily.
The first problem I ran into is the alert builder WebUI. When setting alert context ("I want to alert on:") to 'volume'. The 'Orion General Threshold' is not exposed. It is only exposed under ‘volume capacity forecasting’.
I can link between ‘volume capacity forecasting’ to 'volume', but not the other way. While SQWL has linkages between 'Volume' and ‘volume capacity forecasting’ the alert builder WebUI cannot see in both directions.
This was fairly easy to work around. I set the alert context to ‘volume capacity forecasting’ and linked to 'Volume' to get percent used.
Alert build WebUI with alert set to ‘volume capacity forecasting’, exposing ‘Orion General Thresholds’
‘Volume Capacity forecasting’ linkage to ‘volume’
Alert build WebUI with alert set to ‘volume’ not exposing ‘Orion General Thresholds’
‘Volume’ has no linkage to ‘volume capacity forecasting’
The second issue. I discovered that alerts created under ‘volume capacity forecasting’ were resetting every night at the same time (then opening new alerts 1 minute later). After a lot of research and trail/error, I tracked this down to database maintenance. Database maintenance happens at 2:15am by default. I moved this forward an hour and the alert resets moved to match. Moved database maintenance back to default, alert resets again follow to match. These alert resets are only happening erroneously during database maintenance. I have confirmed by going to impacted volumes to confirm they are over threshold and not bouncing above/under, or cron/batch cycle clearing things. Disk usage remained consistently over threshold during these times between and after reset.
After a lengthy support case we set a custom reset condition on the alerts. Original using default "Reset this alert when trigger condition is no longer true (Recommended)".
The new reset condition is a copy of the trigger condition and reversing the operator.
Trigger
Reset
This seems to get around the issue, but seems like a bug. Does anyone else use a similar alert with 'Orion General Thresholds'? Anyone else seen any odd behavior with reset conditions?