This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

NPM not sending email alerts

FormerMember
FormerMember

Hello,

I have two advanced alerts configured in Orion NPM - one for high CPU usage, and one for high RAM usage.  These alerts are configured for servers running IBM's Clear Case - the type of work done on the servers creates high usage in both areas, and this is expected behavior, but we would like to be alerted when the high usage is steady for an hour.  I have the following alert configured for CPU usage:

cpu_alert.jpg

This alert triggers an email whenever any server that has the custom property "Application" set to "ClearCase" meets the trigger conditions - this alert works properly.

I have the following similar alert set to trigger on RAM usage:

mem_alert.jpg

This alert triggers, as it shows up in "Active Alerts - All Triggered Alerts" that displays on our Orion Summary page, as well as showing up in "Alert Manager - Active Alerts" directly on the Orion NPM server, however, although the alert appears to have triggered, and our trigger and reset actions are identical in terms of SMTP server settings and the email address that alerts are sent to, no email alert is sent when this alert is triggered.  A Test fire of the alert works, but when the alert is actually triggered because the particular server meets the trigger conditions, no email is sent.  I have checked the registry settings as suggested in this thread:  http://thwack.solarwinds.com/message/71332#71332

The "Disable Alert Actions" key does not exist in the registry.

Any suggestions?

  • FormerMember
    0 FormerMember

    As an update to the above post, there are NO items under the alert suppression tab on the configured alert.  I have seen the following error in the AlertServiceLog:

    *** SolarWinds Alerting Engine v2011.2.0.0, .Net Runtime v2.0.50727 ***2012-10-19 09:51:25,187 [MainTaskThread] INFO  All - Alert Engine Starting. Running Version 2011.2.0.0.

    2012-10-19 09:52:13,764 [AlertCheckingThread] WARN  Error - Exception in ExecuteTriggerActions -

    System.Data.SqlClient.SqlException: Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.

       at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection)

       at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection)

       at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj)

       at System.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj)

       at System.Data.SqlClient.SqlDataReader.ConsumeMetaData()

       at System.Data.SqlClient.SqlDataReader.get_MetaData()

       at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)

       at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async)

       at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, DbAsyncResult result)

       at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)

       at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method)

       at System.Data.SqlClient.SqlCommand.ExecuteReader()

       at AlertingEngine.CheckAlert.ExecuteTriggerActions()

    The SQL server is online and I can run manual queries against it from SQL Server Managemetent Studio.  SQL queries in Report Writer also work.  No credential or firewall rules have changed.

    The AlertLog table in the SQL database shows that the alert was triggered, but there is no ActionType - there ARE previous ActionType entries where the NPM EventLog was written to, but currently this alert is configured to only send emails, and for the last few weeks, for every instance of this alert in the AlertLog table, there is no ActionType message, and under Message, it only says Alert Triggered, nothing else about email being sent or any Success message...

    Message was edited by: Steve Duff

  • Are you using multiple properties on the trigger conditions?  Orion doesn't support this.  Also, is it just these two alerts?  Do your other alerts email you correctly?

    Thanks,

    Chrystal Taylor

    http://www.loop1systems.com

  • FormerMember
    0 FormerMember in reply to ChrystalT

    To answer your questions in reverse order, it is not these "two" alerts, it is only the one involving RAM volume percent available less than or equal to 5.  The CPU load alert works just fine, and sent out an email yesterday on one of our servers - I only included the screen capture of the trigger condition to show the similarity between the alert that worked and the one that didn't.  I have also had application monitor and node down alerts trigger successfully this week and send the appropriate emails.

    As to using multiple properties on the trigger conditions, I am no sure what you mean - my trigger condition is set up exactly as shown in the screen capture:

    1) I select "Volume" from the drop down for Type of Property to Monitor.  When doing so, when I go to "Add" a simple condition, the following options are available for selection:  "Select from this list to assign to the 'has changed' condition", "Network Nodes", and "Volumes" - I choose "Volumes>Volume Details>Volume Type" in order to specify "is equal to RAM" from the drop down.

    2) I Add another simple condition, select "Volumes>Volume Status>Volume Percent Available" is less than or equal to 5.

    3) I ADD another simple condition and select "Network Nodes>Custom Properties>Application" (where "Application" is a custom property/column added to our nodes in order to define their purpose, eg "Firewall, Switch, Printer, SQL Server, Domain Controller,etc") and select is equal to "ClearCase" which defines this node as belonging to a collection of equipment we use for the IBM Clear Case application - this is simply to weed out nodes that have nothing to do with "Clear Case" and is only a custom node property, not anything that has to do with Orion APM.

    If you are asking "Are you changing the "Type of Property to Monitor Drop-down box after making some selections so that you can make different selections" the answer is no.  I am not trying to mix volume monitoring with application monitoring.

    "2012-10-19 09:52:13,764 [AlertCheckingThread] WARN  Error - Exception in ExecuteTriggerActions -" <--- this line from the AlertServiceLog (located at C:\ProgramData\Solarwinds\Logs\Orion\) would seem to indicate an error when trying to execute the trigger actions, but the only action defined on the Trigger Actions tab is "Send E-mail/Page to ITcriticalalerts@XXXXXXXX - this is a valid, working email address that all other alerts are sent to - the SMTP server configuration is correct, the checkbox for specific hours for alerting is NOT checked, nor are specific days, and nothing has been configured for Alert Escalation - this trigger action is identical to many others that work, except for the message body.  The fact that I can test this alert and it sends an email correctly indicates that the email functionality is working.  The numerous SQL error messages in the log file that follow the line

    "2012-10-19 09:52:13,764 [AlertCheckingThread] WARN  Error - Exception in ExecuteTriggerActions -" as noted in the above post would seem to indicate that the problem is occurring somewhere with the communication between Orion and SQL, yet, the message body, which uses SQL queries to define some variables, pulls the correct information from the SQL database when I run a test fire of the alert, so I don't understand what would cause a conflict when a real alert gets triggered.

    In case I have missed something in the alert message itself, I will include it here in its entirety:

    Subject:  Alert: ${NodeName} - Extended High Memory Usage

    Message: 

    The following alert has been triggered by 95% + RAM usage for more than an hour

    Date and Time: ${Date Time}

    Node: ${NodeName}

    Volume Type:  ${SQL:Select "VolumeType" from Volumes WHERE NodeID='${NodeID}' and VolumeType='RAM'}

    Volume Percent Used:  ${SQL:Select "VolumePercentUsed" from Volumes WHERE NodeID='${NodeID}' and VolumeType='RAM'}

    Machine Type: ${SQL:Select "MachineType" from Nodes WHERE NodeID='${NodeID}'}

    IP: ${SQL:Select "IP_Address" from Nodes WHERE NodeID='${NodeID}'}

    City: ${SQL:Select "City" from Nodes WHERE NodeID='${NodeID}'}

    Building: ${SQL:Select "Building" from Nodes WHERE NodeID='${NodeID}'}

    When this alert is triggered from a test fire against a particular node, the NodeName, VolumeType, and VolumePercentUsed variables are all correct in the email, as are the other variables (MachineType, IP_Address, City and Building) and I have used similar syntax in other, working alerts, so I really don't know what is hanging this one up...

  • Have you tried recreating it from scratch?  Or just recreating the trigger action?  Occasionally, I come across an alert that is not working properly, due to maybe when we created it we switched properties or something, and creating the same alert from scratch works.  Just a thought.

    Hope this helps

    Chrystal Taylor

    http://www.loop1systems.com

  • FormerMember
    0 FormerMember in reply to ChrystalT

    Yes, thank you.  I thought of that over the weekend, and when I came in today I saw this post.  I can't actually recall if this alert ever worked correctly - I was brought in to administrate Orion after someone else had already configured it, and the original administrator is no longer available to question.  I have only recently begun testing and troubleshooting all the alerts that were already configured.

    I recreated the alert from scratch and it began working correctly.  Thanks again for the tip.

    Steve Duff

  • Hey All,

    Not to drag up an old post but I am having the exact same behavior as posted above.  I have an alert that doesn't send an e-mail, and it is a volume related alert as well.  The test alert functions correctly and the alert comes through.  Here is a screen shot of my triggers:

    TriggerConditions.JPG

    The reset conditions are set to just when trigger are no longer set, time of day is set to default, no suppression, and only one trigger action which is to send an e-mail.  The SMTP settings are identical to other alerts that work correctly.

    Anyone else have this happen?  It looks like it only applies to the "Type of Property to Monitor: Volume", as all my other alerts work.