I have created a Component Monitor Linux script which scans through a backup log and reports different statistics found in the log. It also evaluates the info in the log to determine the backup status (Successful, In Progress, Failed, etc.). I then assigned status numbers to each status depending on which I considered successful, Warning or Critical and used those numbers in the Monitors Script Output evaluations for Warning and Critical thresholds. The monitor works great and threshold trigger events as I wanted.
As for the alert rules, I have a generic alert rule that is already built and will handle the notification if a threshold is crossed. However I also want to send an email to staff to report a backup was completed and successful. As opposed to the non-successful notifications, staff would get this notification once per day, mid-morning. So to build the rule I set the alert trigger conditions to engage if the Component Monitor state is Up. I then set the time of day to the time I want the alert rule to check and send the daily complete and successful backup email. The part I have not figured out is how to reset the alert so that it moves off of the active alert list and will be able to again trigger the following day.
What I have tried is to:
- Set the time-of-day to just a couple of minutes, set the reset conditions to be the same as the trigger conditions and set the reset delay timer to one minute longer than the time-of-day end time. I would have expected this to work as the chain of events should be that the alert triggers and the reset timer starts. As long as the trigger remains active, the trigger actions should fire however they do not. The only thing that happens is that the alert appears in the active alert list and disappears with the reset timer runs out.
- To ensure that the rest of the alert rule works properly, I changed the component status in the reset conditions to down and had the rule fire again. This did result in the actions being executed, but then the alert stays on the list after the actions, reset timer runs out and the time-of-day period expires because the component state is still up.
- For kicks, which I did not expect to work, I also selected "Reset when trigger conditions are no longer true" in the reset conditions and of course, the trigger conditions do not change so the alert stays active and again never resets. At least not unless the backup status changes.
I can see where this would also be a helpful scenario for providing informational alerts to staff for things like successful database integrity checker sessions and so on. Has anyone done something similar to this and if so, can you guide me in the right direction?
THANKS
Mike