This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Solution for how to display the reason why a component is critical or down

I have seen this request bouncing around the forums for years and I finally got some time to cook up a solution that I think works pretty well and covers the bulk of cases.

Because there are 49 types of components in sam, and each component has multiple scenarios that would give them negative statuses the only viable solution was just to build a monster of SQL case logic testing things out and then hunting down where I could put the relevant data from the DB to let us know whats happening.  There are still some components that we don't use at my work that I haven't added in yet, but if you happen to run into one of those it isn't too hard to add additional "when" lines to decide how you want it to deal with those scenarios.  Feel free to post into the thread if you run into any additional cases or have to modify it for cases I haven't run across.

The way I leverage this is just to paste all of the below into the alert trigger message as seen in this screenshot and then include the ${N=Alerting;M=AlertMessage} variable in my alert messages. 

mesverrum_5-1588969217023.png

You can see some examples of the kinds of reasons it outputs below

mesverrum_2-1588966728267.png

mesverrum_4-1588966850147.png

mesverrum_1-1588966662111.png

mesverrum_0-1588966586444.png

${N=SwisEntity;M=Application.Node.Caption} ${N=SwisEntity;M=Application.ApplicationAlert.ApplicationName} ${N=SwisEntity;M=ComponentAlert.ComponentName} is ${N=SwisEntity;M=Status;F=Status} ` 
Reason: ${SQL: select isnull((
select
cast(concat(case
when cst.ComponentType in (1,8) and cs.Availability=2 then concat(cs.InstanceCount, ' instances of ',serv.value, ' found')
when cst.ComponentType in (2,37,38,39,55,63) and cs.Availability=2 then cst.ErrorMessage
when cst.ComponentType in (9) and cs.Availability=2 then concat(serv.value, ' ',cst.ErrorMessage)
when cst.ComponentType in (6,14) and cs.Availability in (2,12) then concat(replace(replace(replace(url.value,'${IP}',n.IP_Address),'${port}',port.value),'${Node.Caption}',n.caption), ' returned: ',cst.ErrorMessage)
when cst.ComponentType in (42) and cs.Availability in (2,5,6) then concat(cs.StatisticData, ' events matched the rule, see component view for details')
when cst.ComponentType in (21) and cs.Availability=2 then concat(cav.multivaluestatistics, ' - ',replace(cav.multivaluemessages,'Statistic:','')) 
when cst.ComponentType in (1,6,8,9,14,20,21,45,32,37) and cs.Availability = 6 then concat(
case when multivaluestatistics is not null then concat(cav.multivaluestatistics, ' - ',replace(cav.multivaluemessages,'Statistic:','')) 
else '' end
, case when tbc.ThresholdName is not null then concat(tbc.thresholdname ,' is '
, case when tbc.thresholdname = 'CPU' then cst.ComponentPercentCPU
when tbc.thresholdname = 'PMem' then cst.ComponentPercentMemory
when tbc.thresholdname = 'VMem' then cst.ComponentPercentVirtualMemory
when tbc.thresholdname = 'IOReadOperationsPerSec' then cst.ComponentIOReadOperationsPerSec
when tbc.thresholdname = 'IOWriteOperationsPerSec' then cst.ComponentIOWriteOperationsPerSec
when tbc.thresholdname = 'IOTotalOperationsPerSec' then cst.ComponentIOTotalOperationsPerSec
when tbc.thresholdname = 'Response' then cst.ComponentResponceTime
when tbc.thresholdname = 'StatisticData' then cst.ComponentStatisticData
end
, ', threshold is ' , case 
when tbc.thresholdoperator = 0 then 'greater than '
when tbc.thresholdoperator = 1 then 'greater than or equal to '
when tbc.thresholdoperator = 2 then 'equal to '
when tbc.thresholdoperator = 3 then 'less than or equal to '
when tbc.thresholdoperator = 4 then 'less than '
when tbc.thresholdoperator = 5 then 'not equal to '
end
, cast(tbc.critical as varchar)
, ' for ', isnull(isnull(ovr.criticalpolls,t.criticalpolls),1)
, case when isnull(ovr.criticalpolls,t.criticalpolls) != isnull(ovr.criticalpollsinterval,t.criticalpollsinterval) then concat(' of ',isnull(ovr.criticalpollsinterval,t.criticalpollsinterval)) else '' end 
, case when isnull(ovr.criticalpolls,t.criticalpolls) > 1 then ' polls' else ' poll' end, CHAR(13)
) else '' end)
when cs.Availability=2 then '...'
else 'To be categorized...'
end, CHAR(10)) as xml) as Reason

from nodes n
join APM_Application a on a.nodeid=n.nodeid
join APM_Component c on c.applicationid=a.id
join APM_CurrentComponentStatus cs on c.id=cs.componentID
join APM_CurrentStatistics cst on cst.componentid=c.id
join APM_StatusMetadata st on st.ApmStatusValue=cs.Availability
left join APM_ThresholdsByComponent tbc on tbc.ComponentID=c.id and tbc.Critical < 1.797E+308 
and (
(tbc.thresholdname = 'StatisticData' and (
(tbc.thresholdoperator = 0 and (cst.componentstatisticdata > statisticcritical)) or
(tbc.thresholdoperator = 1 and (cst.componentstatisticdata >= statisticcritical)) or
(tbc.thresholdoperator = 2 and (cst.componentstatisticdata = statisticcritical)) or
(tbc.thresholdoperator = 3 and (cst.componentstatisticdata <= statisticcritical)) or
(tbc.thresholdoperator = 4 and (cst.componentstatisticdata < statisticcritical)) or
(tbc.thresholdoperator = 5 and (cst.componentstatisticdata != statisticcritical))))

or

(tbc.thresholdname = 'Response' and (
(tbc.thresholdoperator = 0 and (cst.componentresponcetime > responsetimecritical)) or
(tbc.thresholdoperator = 1 and (cst.componentresponcetime >= responsetimecritical)) or
(tbc.thresholdoperator = 2 and (cst.componentresponcetime = responsetimecritical)) or
(tbc.thresholdoperator = 3 and (cst.componentresponcetime <= responsetimecritical)) or
(tbc.thresholdoperator = 4 and (cst.componentresponcetime < responsetimecritical)) or
(tbc.thresholdoperator = 5 and (cst.componentresponcetime != responsetimecritical))))

or

(tbc.thresholdname = 'CPU' and (
(tbc.thresholdoperator = 0 and (cst.componentpercentcpu > cpucritical)) or
(tbc.thresholdoperator = 1 and (cst.componentpercentcpu >= cpucritical)) or
(tbc.thresholdoperator = 2 and (cst.componentpercentcpu = cpucritical)) or
(tbc.thresholdoperator = 3 and (cst.componentpercentcpu <= cpucritical)) or
(tbc.thresholdoperator = 4 and (cst.componentpercentcpu < cpucritical)) or
(tbc.thresholdoperator = 5 and (cst.componentpercentcpu != cpucritical))))

or

(tbc.thresholdname = 'PMem' and (
(tbc.thresholdoperator = 0 and (cst.componentpercentmemory > physicalmemorycritical)) or
(tbc.thresholdoperator = 1 and (cst.componentpercentmemory >= physicalmemorycritical)) or
(tbc.thresholdoperator = 2 and (cst.componentpercentmemory = physicalmemorycritical)) or
(tbc.thresholdoperator = 3 and (cst.componentpercentmemory <= physicalmemorycritical)) or
(tbc.thresholdoperator = 4 and (cst.componentpercentmemory < physicalmemorycritical)) or
(tbc.thresholdoperator = 5 and (cst.componentpercentmemory != physicalmemorycritical))))

or

(tbc.thresholdname = 'VMem' and (
(tbc.thresholdoperator = 0 and (cst.componentpercentvirtualmemory > virtualmemorycritical)) or
(tbc.thresholdoperator = 1 and (cst.componentpercentvirtualmemory >= virtualmemorycritical)) or
(tbc.thresholdoperator = 2 and (cst.componentpercentvirtualmemory = virtualmemorycritical)) or
(tbc.thresholdoperator = 3 and (cst.componentpercentvirtualmemory <= virtualmemorycritical)) or
(tbc.thresholdoperator = 4 and (cst.componentpercentvirtualmemory < virtualmemorycritical)) or
(tbc.thresholdoperator = 5 and (cst.componentpercentvirtualmemory != virtualmemorycritical))))

or

(tbc.thresholdname = 'IOReadOperationsPerSec' and (
(tbc.thresholdoperator = 0 and (cst.componentioreadoperationspersec > ioreadoperationsperseccritical)) or
(tbc.thresholdoperator = 1 and (cst.componentioreadoperationspersec >= ioreadoperationsperseccritical)) or
(tbc.thresholdoperator = 2 and (cst.componentioreadoperationspersec = ioreadoperationsperseccritical)) or
(tbc.thresholdoperator = 3 and (cst.componentioreadoperationspersec <= ioreadoperationsperseccritical)) or
(tbc.thresholdoperator = 4 and (cst.componentioreadoperationspersec < ioreadoperationsperseccritical)) or
(tbc.thresholdoperator = 5 and (cst.componentioreadoperationspersec != ioreadoperationsperseccritical))))

or

(tbc.thresholdname = 'IOWriteOperationsPerSec' and (
(tbc.thresholdoperator = 0 and (cst.componentiowriteoperationspersec > iowriteoperationsperseccritical)) or
(tbc.thresholdoperator = 1 and (cst.componentiowriteoperationspersec >= iowriteoperationsperseccritical)) or
(tbc.thresholdoperator = 2 and (cst.componentiowriteoperationspersec = iowriteoperationsperseccritical)) or
(tbc.thresholdoperator = 3 and (cst.componentiowriteoperationspersec <= iowriteoperationsperseccritical)) or
(tbc.thresholdoperator = 4 and (cst.componentiowriteoperationspersec < iowriteoperationsperseccritical)) or
(tbc.thresholdoperator = 5 and (cst.componentiowriteoperationspersec != iowriteoperationsperseccritical))))

or

(tbc.thresholdname = 'IOTotalOperationsPerSec' and (
(tbc.thresholdoperator = 0 and (cst.componentiototaloperationspersec > iototaloperationsperseccritical)) or
(tbc.thresholdoperator = 1 and (cst.componentiototaloperationspersec >= iototaloperationsperseccritical)) or
(tbc.thresholdoperator = 2 and (cst.componentiototaloperationspersec = iototaloperationsperseccritical)) or
(tbc.thresholdoperator = 3 and (cst.componentiototaloperationspersec <= iototaloperationsperseccritical)) or
(tbc.thresholdoperator = 4 and (cst.componentiototaloperationspersec < iototaloperationsperseccritical)) or
(tbc.thresholdoperator = 5 and (cst.componentiototaloperationspersec != iototaloperationsperseccritical))))

)
left join APM_Threshold t on c.id=t.id and t.thresholdname=tbc.thresholdname and t.istemplate=1
left join APM_Threshold ovr on c.id=ovr.id and ovr.thresholdname=tbc.thresholdname and ovr.istemplate=0
left join apm_componentalertvariable cav on cav.componentid=c.id
left join APM_ComponentDetails url on c.id=url.ID and url.[Key] = 'URL'
left join APM_ComponentDetails port on c.id=port.ID and port.[Key] = 'portnumber'
left join APM_ComponentDetails serv on c.id=serv.ID and serv.[Key] in ('ServiceName','ProcessName')

where c.id = ${N=SwisEntity;M=ComponentID}
FOR XML PATH('') ),'No issue identified');  }