I'm trying to monitor for a specific condition in my application whereby it sometimes loses access to its database. When it does this an event is logged into the Windows Application Event log. When it regains connection another event is logged.
I know how to create an event log monitor for tracking a specific event but I need help logically creating a monitor that will show the system as down if the connection loss event occurs, but then returns to normal when the recovery event occurs. From what I can see this is not possible with the default windows event log component monitor for 2 reasons.
1) if the original event falls outside of the polling interval it will no longer be considered "found" thus no issue detected even if the servicer has been down for quite some time, and
2) I see no way to tie two different events together in said monitor
Am I going to have to build something custom for this? E.g. Powershell?
One potential way of doing it is with the event log monitors and custom node status. I've done similar previously with triggering and resetting an alert but not setting the node status.
If the disconnection event is found, the alert will trigger and set the node to crtical or down (based on what you configure). The alert will stay active.
When the connection restored event is found, the alert will reset and change the node back to polled status. If you have enhanced node status on and application status on, potentially the node will be warning for a poll or two until the application status returns to up/green.
Interesting solution, however given this solution doesn't really factor in timestamps of the events, wouldn't this solution get potentially confused regarding the actual status of the service? For example:
For example if it goes down (event 1 triggered), comes up (event 2 triggered) and goes back down again (event 3 triggered) all within a given polling cycle, both down and up events would be found simultaneously and this solution would see the service as UP ignoring the last event as being marked down, no?
I was expecting that. Sigh. Now the question is, how to actually go about doing this. Sounds like I'll have to search for both expected events each poll and if the "up" event is newer than the older event, it's up, otherwise it's down. Thoughts?
That's the approach I had in mind as well. A possibility too that might help it run faster is to leave some kind of placeholder file maybe in windows/temp on each node that tracks what the timestamp was for the last time you polled the node and if the component should be up or down based on the most recent event. That way you only need to scan for the opposite event and only need to look at events in the last few minutes rather than crawling through all 9 billion windows events. That might add some complexity to the whole affair but ultimately could save you from parsing a lot of extra data all the time.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.