This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Alert triggers erroneously when a poll is missed

Had a situation where I (and very high ranking individuals) were paged about a critical situation that did not exist. An alert has been setup up to page out if a certain OID  comes back larger than 30. It is essentially always 0.

In this instance, the poll was missed, so there was no data. At that moment, the alert triggered, meaning that the alert must have been evaluated as true. Thus, the alert engine must have used something that was larger than 30 during the evaluation. Clearly it shouldn't, and I believe it is supposed to use the last valid polled information.

Has anyone seen this before? Can anyone think of a way avoid it? I honestly just think it is a bug, but seems like something that folks should have run into before. Support was no helpful. At all.

  • If it were my environment I'd pull the history on that undp poller to see what values were in the database (assumes you have "keep historical" checked on that undp). Normally there are no entries at all on a missed poll so there is nothing for the alert engine to act on and I wouldn't expect a message.  I also think it is a good practice to include the value of whatever metric I'm tracking in the body of the message that goes out.

    A query like this should show you whatever is in the db

    SELECT *

    FROM [dbo].[CustomPollerStatistics_Detail] cps

    join CustomPollerAssignment cpa on cpa.custompollerassignmentid=cps.custompollerassignmentid

    where cpa.assignmentname = 'pollername on myserver'

    and cps.datetime>'start window'

    and cps.datetime<'end window'

    and the variable in include a statistic's value in the email is Current Value: ${N=SwisEntity;M=CustomPollerStatusScalar.Status}

    but depending on the type of monitor you have set up there might be different ones you would need to use, make sure to test it out.

  • That's exactly my point. There was no data, so there shouldn't have been anything to act on. But I like your idea of displaying the value used in the decision making. Might explain what happened.