This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Oracle Database Availability - False Positive Alerts

I've setup an Oracle Database availability monitor/alert. I created an Oracle End User experience monitor that sends a query to our Oracle database to confirms its up. 

The query is 'SELECT 1 FROM DUAL'. If this query returns 1, that means the database is up. If it doesn't, that means the database is down. 

The alert I have setup polls every 5 minutes and I'll occasionally get false-positive down alerts for. These alerts all come from the same node and every time I am alerted, my Oracle DBA confirms that the database is indeed running. 

I've already confirmed that the metrics for this node look good, so I'm thinking it has something to do with the Oracle database itself. This same monitor/alert has also been setup for a bunch of other databases on other nodes but they don't throw these false-positives. 

Does anyone have any insights or suggestions on how to troubleshoot this?

Parents
  • One suggestion I have is try polling more often and then change the alert logic so it has to fail for 2 or more times before creating an alert. 

  • Here are some screen shots to explain. If you were to run your SAM template once a minute -vs- once every 5 minutes then set your alert up like this, to check the query results every minute as well. 

    Then in the Trigger field set it to condition must exist for 5 minutes.

    Then the query will have had to have failed 5 times in a row before the alert gets triggered. So you end up still having an alert on a 5 minute interval but it has more data points to look at so it is less likely to send a false alert out.

Reply
  • Here are some screen shots to explain. If you were to run your SAM template once a minute -vs- once every 5 minutes then set your alert up like this, to check the query results every minute as well. 

    Then in the Trigger field set it to condition must exist for 5 minutes.

    Then the query will have had to have failed 5 times in a row before the alert gets triggered. So you end up still having an alert on a 5 minute interval but it has more data points to look at so it is less likely to send a false alert out.

Children
No Data