This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Alert when polling engine fails

One of our auxiliary polling engines unexpectedly stopped polling. All the services on that polling engine remained up and appeared to be working, but no data was being captured. Services on the primary and other auxiliary polling engines, together with their other modules (APM, NetFlow, NCM, IPSLA) remained operational. It was just this one server that seemed to have taken a nap.

While the resolution was a quick reboot of that server, the worrying thing is that no alarm had been raised. Even the devices that it had been polling were indicating that they were still up as their last known status was up and no change had been observed.

One way to monitor the polling servers would be to check the same status as the Start > Programs > Advanced Features > Monitor Polling Engine application. The obvious parameter to check would seem to be Last Sync, but the only tests that I have found are for explicit dates, not relative times.

Is there a way to alert if the polling engine’s status becomes stopped for, say, 15 minutes or if the last database sync is older than 15 minutes?
  • I've mentioned this before in another post, but we use the following SQL statement as an ADO user experience monitor within ipMonitor.

    select ServerName, KeepAlive, GetDate(), DateDiff(second, KeepAlive, GetDate()) from [dbo].[Engines] where DateDiff(second, KeepAlive, GetDate()) > 60

    If the row count returned is > 0, then at least one of your pollers hasn't updated the database within 60 seconds (indicating an issue with the polling process).

    I think that this could also be adapted to run from the APM module that you have.

    Dave.