This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Email alerts for "Alert me when a component goes into warning or critical state" not informative

Good morning

I'm a new sysadmin at a company with a nearly out of the box Solarwinds install. One of my initial tasks is to clean it up.

I've noticed that we are getting email alerts for when a component goes into a warning or critical state, but there is no information in the email to tell you what the issue is. You have to click the link in the email to find out more information, which obviously does not work if you are just on your phone reviewing email alerts.

I would like to edit the text so it tells me what the problem is.

How would I do that please?

I have found the alarm in Manage Alerts and then found on "trigger actions" what is shown in the email, so I need to edit this code to include the description. The ComponentAlert.UserDescription is blank as per my email, so how do I know what to put in here to show that it was a CPU load issue?

Component ${N=SwisEntity;M=ComponentAlert.ComponentName} of Application "${N=SwisEntity;M=Application.ApplicationAlert.ApplicationName}" on ${N=SwisEntity;M=Application.Node.Caption} is

currently in a "${N=SwisEntity;M=ComponentAlert.ComponentAvailability}" state. 

Click the link below for more information related to this component:

${N=SwisEntity;M=DetailsUrl}

Additional Information:

${N=SwisEntity;M=ComponentAlert.UserDescription}

An example email that I am receiving:

pastedImage_6.png

When I click the link it then takes me to the Application Component Details page, and in this instance I can see it was CPU load related.

pastedImage_1.png

Even then the details page here does not give me in any information to what the issue is/was, just that is went into warning, and then was "up".

pastedImage_2.png

Thanks for your help

  • I've added:

    ${N=SwisEntity;M=ApplicationAlert.ComponentsWithProblemsFormatted}

    From Alert variables for SAM - SolarWinds Worldwide, LLC. Help and Support

    Waiting for some more alerts to see what happens with this change.

  • OK this did nothing other than literally print ${N=SwisEntity;M=ApplicationAlert.ComponentsWithProblemsFormatted} at the bottom of the email.

    pastedImage_0.png

    From looking into this a little bit I'm wondering if this is even possible, I'm hoping it is otherwise this is a really annoying.

    The reason I say it is not possible (I may be completely wrong) is that the Exchange monitors seem to have the desired results, but it seems they are configured differently as they have each monitor as its own section, where as SQL is just grouped under the instance ("SQL Server (MSSQLSERVER"), so CPU/Memory etc is on one page with SQL, but Exchange is broke down by Counters, then a separate page per counter (example below is Dumpster Size)

    pastedImage_2.png

    Here is the Exchange monitor, which has a more detailed explanation which looks good:

    pastedImage_1.png

  • Hi masterpark the reason the variable isn't working for you is that you have inserted an Application Alert Variable into a Component Alert. The two are different.

    Example: Application Alert - Application "! SQL Servers" on "SERVERNAME" is currently in a state of "Critical"

    Then adding the variable into that alert email would list the components that are in a problem state such as:

    Page Splits/Batch Request(Warning)

    Workfiles Created/Sec(Critical)

    Worktables Created/sec(Critical)

    Target - Total Server Memory(Warning)

    Hope this helps.

  • So this particular issue is kind of difficult because there is no single value in the database that is "this is what is wrong with your component"

    There are 49 different types of components, and many of those components can go into warning/critical based on multiple different metrics and conditions that you define when you set up the component monitor.  I wrote a SQL report a bit ago to try to display the root causes of warn/critical messages and display whatever threshold they were breaching and it's over 100 lines long and still doesn't cover all scenarios.  In most cases I would just say you should know that in order for something to be warn/critical it has breached some kind of threshold but isn't down, and you will need to log into the GUI to get more specific info than that.

  • Hi David,

    Sorry that does not help, can you explain further please? Adding what variable? Did you miss something from your post?

    I understand that in my example:

    Component Alert = SQL Server (MSSQLSERVER)

    Application Alert = "! SQL Servers"

    But how do I go deeper to give the issue?

  • Hi

    Thanks for the reply. This monitor really needs some sort of review by solarwinds, it really does not make sense to me how they would set up a monitor that does not show you what the actual issue is, other than a high level 'something is wrong with SQL'.

    The Exchange monitor seems to be created in a better way

    So for exchange it is:

    Server > Exchange Monitor > Specific Monitor – which shows for example “Component Version buckets allocated (database)”

    Where on SQL it is:

    Server > SQL Monitor > Instance Monitor (Which has separate monitors at this level), so it just shows “Component SQL Server (MSSQLSERVER)”, rather than "SQL CPU" or something.

    I understand I can click the link, but as I stated before it does not help if you are just looking at your phone for email alerts.

    For now I'm going to get the threshold amended so it reduces the noise from this alert, as it is firing an alert for 1 poll, I will change it to 3.

  • I would advise to take a look at the alert trigger and action. The out-of-the-box alerts should tell you what component / application was triggered and on what application / server.

    pastedImage_0.png

  • Hi

    Thanks for the reply.

    That still does not tell you what the issue is, and this is a different monitor. You are still getting a generic email on your example.

    If you're monitor said the below, it would be what I want and expect:

    Service "SuperApplication.exe" - Application "Windows Server 2003-2012 Services" - On ServerName29 is currently in Critical State

    In my example, I want:

    Critical CPU Threshold Breached - Component SQL Server (MSSQLSERVER) of Application "! SQL Servers" on SERVERNAME1 is currently in a "Critical" state. 

    Instead of: (Which does not tell me what the issue is!)

    Component SQL Server (MSSQLSERVER) of Application "! SQL Servers" on SERVERNAME1 is currently in a "Critical" state. 

  • Can anyone help any further please?

  • This was the work in progress sql select to try to determine what caused the critical condition, you would add this as a custom variable to the email body, and if i recall correctly there were some bugs around those sql statements in the sam 6.6(maybe?) release that were supposed to be fixed in the latest stuff.

    SELECT concat(tbc.thresholdname, ' '

    , case

    when tbc.thresholdoperator = 0 then 'greater than '

    when tbc.thresholdoperator = 1 then 'greater than or equal to '

    when tbc.thresholdoperator = 2 then 'equal to '

    when tbc.thresholdoperator = 3 then 'less than or equal to '

    when tbc.thresholdoperator = 4 then 'less than '

    when tbc.thresholdoperator = 5 then 'not equal to '

    end

    , cast(convert(float, tbc.critical) as varchar), ' for ', isnull(t.criticalpolls,1), case when t.criticalpollsinterval != t.criticalpolls then concat(' of ',t.criticalpollsinterval) else '' end ,case when t.criticalpolls > 1 then ' polls' else ' poll' end

    ) as CriticalDescription

    , *

    FROM [dbo].[APM_ThresholdsByComponent] tbc

    left join [dbo].[APM_Threshold] t on tbc.componentid=t.id and t.thresholdname=tbc.thresholdname and t.istemplate=1

    left join [dbo].[APM_Threshold] ovr on tbc.componentid=ovr.id and ovr.thresholdname=tbc.thresholdname and ovr.istemplate=0

    join [dbo].[APM_CurrentStatistics] cs on cs.componentid=tbc.componentid

    where tbc.componentid=${componentid} and

    (tbc.critical != '1.7976931348623157E+308')

    and (

    (tbc.thresholdname = 'StatisticData' and (

    (tbc.thresholdoperator = 0 and (cs.componentstatisticdata > statisticcritical)) or

    (tbc.thresholdoperator = 1 and (cs.componentstatisticdata >= statisticcritical)) or

    (tbc.thresholdoperator = 2 and (cs.componentstatisticdata = statisticcritical)) or

    (tbc.thresholdoperator = 3 and (cs.componentstatisticdata <= statisticcritical)) or

    (tbc.thresholdoperator = 4 and (cs.componentstatisticdata < statisticcritical)) or

    (tbc.thresholdoperator = 5 and (cs.componentstatisticdata != statisticcritical))))

    or

    (tbc.thresholdname = 'Response' and (

    (tbc.thresholdoperator = 0 and (cs.componentresponcetime > responsetimecritical)) or

    (tbc.thresholdoperator = 1 and (cs.componentresponcetime >= responsetimecritical)) or

    (tbc.thresholdoperator = 2 and (cs.componentresponcetime = responsetimecritical)) or

    (tbc.thresholdoperator = 3 and (cs.componentresponcetime <= responsetimecritical)) or

    (tbc.thresholdoperator = 4 and (cs.componentresponcetime < responsetimecritical)) or

    (tbc.thresholdoperator = 5 and (cs.componentresponcetime != responsetimecritical))))

    or

    (tbc.thresholdname = 'CPU' and (

    (tbc.thresholdoperator = 0 and (cs.componentpercentcpu > cpucritical)) or

    (tbc.thresholdoperator = 1 and (cs.componentpercentcpu >= cpucritical)) or

    (tbc.thresholdoperator = 2 and (cs.componentpercentcpu = cpucritical)) or

    (tbc.thresholdoperator = 3 and (cs.componentpercentcpu <= cpucritical)) or

    (tbc.thresholdoperator = 4 and (cs.componentpercentcpu < cpucritical)) or

    (tbc.thresholdoperator = 5 and (cs.componentpercentcpu != cpucritical))))

    or

    (tbc.thresholdname = 'PMem' and (

    (tbc.thresholdoperator = 0 and (cs.componentpercentmemory > physicalmemorycritical)) or

    (tbc.thresholdoperator = 1 and (cs.componentpercentmemory >= physicalmemorycritical)) or

    (tbc.thresholdoperator = 2 and (cs.componentpercentmemory = physicalmemorycritical)) or

    (tbc.thresholdoperator = 3 and (cs.componentpercentmemory <= physicalmemorycritical)) or

    (tbc.thresholdoperator = 4 and (cs.componentpercentmemory < physicalmemorycritical)) or

    (tbc.thresholdoperator = 5 and (cs.componentpercentmemory != physicalmemorycritical))))

    or

    (tbc.thresholdname = 'VMem' and (

    (tbc.thresholdoperator = 0 and (cs.componentpercentvirtualmemory > virtualmemorycritical)) or

    (tbc.thresholdoperator = 1 and (cs.componentpercentvirtualmemory >= virtualmemorycritical)) or

    (tbc.thresholdoperator = 2 and (cs.componentpercentvirtualmemory = virtualmemorycritical)) or

    (tbc.thresholdoperator = 3 and (cs.componentpercentvirtualmemory <= virtualmemorycritical)) or

    (tbc.thresholdoperator = 4 and (cs.componentpercentvirtualmemory < virtualmemorycritical)) or

    (tbc.thresholdoperator = 5 and (cs.componentpercentvirtualmemory != virtualmemorycritical))))

    or

    (tbc.thresholdname = 'IOReadOperationsPerSec' and (

    (tbc.thresholdoperator = 0 and (cs.componentioreadoperationspersec > ioreadoperationsperseccritical)) or

    (tbc.thresholdoperator = 1 and (cs.componentioreadoperationspersec >= ioreadoperationsperseccritical)) or

    (tbc.thresholdoperator = 2 and (cs.componentioreadoperationspersec = ioreadoperationsperseccritical)) or

    (tbc.thresholdoperator = 3 and (cs.componentioreadoperationspersec <= ioreadoperationsperseccritical)) or

    (tbc.thresholdoperator = 4 and (cs.componentioreadoperationspersec < ioreadoperationsperseccritical)) or

    (tbc.thresholdoperator = 5 and (cs.componentioreadoperationspersec != ioreadoperationsperseccritical))))

    or

    (tbc.thresholdname = 'IOWriteOperationsPerSec' and (

    (tbc.thresholdoperator = 0 and (cs.componentiowriteoperationspersec > iowriteoperationsperseccritical)) or

    (tbc.thresholdoperator = 1 and (cs.componentiowriteoperationspersec >= iowriteoperationsperseccritical)) or

    (tbc.thresholdoperator = 2 and (cs.componentiowriteoperationspersec = iowriteoperationsperseccritical)) or

    (tbc.thresholdoperator = 3 and (cs.componentiowriteoperationspersec <= iowriteoperationsperseccritical)) or

    (tbc.thresholdoperator = 4 and (cs.componentiowriteoperationspersec < iowriteoperationsperseccritical)) or

    (tbc.thresholdoperator = 5 and (cs.componentiowriteoperationspersec != iowriteoperationsperseccritical))))

    or

    (tbc.thresholdname = 'IOTotalOperationsPerSec' and (

    (tbc.thresholdoperator = 0 and (cs.componentiototaloperationspersec > iototaloperationsperseccritical)) or

    (tbc.thresholdoperator = 1 and (cs.componentiototaloperationspersec >= iototaloperationsperseccritical)) or

    (tbc.thresholdoperator = 2 and (cs.componentiototaloperationspersec = iototaloperationsperseccritical)) or

    (tbc.thresholdoperator = 3 and (cs.componentiototaloperationspersec <= iototaloperationsperseccritical)) or

    (tbc.thresholdoperator = 4 and (cs.componentiototaloperationspersec < iototaloperationsperseccritical)) or

    (tbc.thresholdoperator = 5 and (cs.componentiototaloperationspersec != iototaloperationsperseccritical))))

    )