1 of 1 people found this helpful
I've been attempting to test the two trigger conditions I think that would work for this. Which are as follows:
but after making those trigger conditions I had questions whether they would work and set up groups in order to test this. My fear being that it would just take 4 nodes being down and not matter which group they are in( 2 nodes being down in 2 separate groups I thought may trigger this). For my testing purposes I created two groups. One group has 1 node down, the other group has 6 nodes down for testing.
Now when I go back to try and set up the trigger conditions from above I get the following setting:
"Aggregated data must occur within the last" Field that allows XX amount of seconds or minutes both max value being 60.
Ignoring causes issues because on the summary page which you can see below doesn't have the prompt to show you whether the trigger condition returns as true. Will continue to test and figure out what I can but I can't find that setting in the 12.01
Here are the screenshots I have:
When I get to the summary of the alert it doesn't have the box in the lower right showing whether this alert will fire on anything or not. I checked for the setting by ctrl+f in the administrator guide for NPM 12.01 but it could not find anything. Haven't been able to find anything from google so far.
The box in the lower right hand corner does not appear if i check the box for "Alert can be triggered if .."
Questions I leave myself with are, what is the "Aggregated data must occur within the last " setting and why would the prompt in the lower right hand corner of the alert editing summary page be missing
More for me to learn here. Thanks for the post and any input would be greatly appreciated. Also hope any of this helps you out garystorr.
My initial thoughts were to use groups but the roll up status didn't fit the criteria i needed, but i hadn't thought of using the group member status which combined with the "Alerts can be triggered if more than xx objects" would work. This still wouldn't work as i hoped though because there may be 3 sites down during the day, but if a single node went down OOH it would trigger the 4 nodes down criteria of the alert. However, adding in the Aggregate Data setting ( which i have never seen before) would then mean all 4 sites would have to go down within xx mins of each other. This sounds exactly what i need so thank you very much.
I think the reason your test isn't showing on the summary page is because the sites down where not triggered in the last 10 mins which you have set in the Aggregated data setting.
I will set this up and test, but think i will have to wait for a genuine outage to see if it works as intended. I think the theory you mention above works perfectly so i will let you know as and when i get to test live.
3 of 3 people found this helpful
the way to do this is a custom SWQL query
this will generate an alert when atleast 10 nodes are down in a group:
WHERE containerid IN (SELECT containerid
FROM (SELECT groups.containerid,
Count(*) AS down
FROM orion.groups AS Groups
INNER JOIN orion.containermembersnapshots
groups.containerid = CMS.containerid
INNER JOIN orion.nodes N
ON n.nodeid = cms.entityid
CMS.entitytype = 'Orion.Nodes'
AND n.status = 2
GROUP BY groups.containerid
HAVING Count(*) >= 10))
(the reset condition is also a custom query with a 'NOT' before the first IN ONLY
Thanks for the reply. It looks like this would cover what i initially mentioned, but i assume this would trigger whenever 10 nodes were down in total? i would need this to only trigger if 10 or in my case 4 nodes went down within a specific time period. As this is a large network, there are always a handful of nodes down so this alert would trigger and probably remain active all the time. I really only need the alert to trigger if the nodes all go down together or within a few minutes of each other.
Thanks again for the suggestion. I'm learning new things all the time.
I'm not sure putting a time bound on the count of nodes in a group down is helpful, but I don't know your environment. [my environment is many hundred buildings, >400 routers , and >20 thousand switches and wireless access points]. I'm almost certainly going to put an alert like this into production, and suppress alerts on nodes in triggered groups which should help when we have major events when I get back into the office next week.
My groups are buildings (on campus), or regions (for the state), or 'sectors' for national/international groups. I also exclude unmanaged nodes from groups' dynamic queries because the nodes may never go back into service, hence I didn't exclude them from the nodes selection criteria. NOTE: out of the box node alerts automatically exclude unmanaged nodes, but other alerts that include unmanaged nodes MAY include them...
All of the alerts in Orion (today) ultimately get converted into a SQL/SWQL sql query, so my first step for really complex alerts not supported out of the box is to use the Database manager or SWQL query application (from the SDK) and try to write the query there.
I gave the SWQL version of the query, so install SWQL studio on your desktop and try queries there.
hint: add a WHERE clause before the GROUP BY to select nodes that meet the time and any other criteria (e.g. unmanaged)
Reset Query: don't add the time window so the triggered alert doesn't immediately clear
(I do not know your environment, so you have to work out what your actual conditions are)
I'd suggest to make dynamic group with at least 4 nodes DOWN, then change status to 'Show the best status' and finally trigger alert if the group is DOWN.