This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

How to send hourly email with all nodes that are currently down

I need to make an alert where it sends an email with all nodes that are currently down every hour instead of getting 1 mail per node each time it goes down. Maybe some additional info like MAC address or whatever, so maybe something like this

Nodes currently down:

192.xxx.xxx.xxx  MAC-adress

192.xxx.xxx.xxx  MAC-adress

192.xxx.xxx.xxx  MAC-adress

192.xxx.xxx.xxx  MAC-adress

  • So this is basically a bad idea, contractors like me end up getting brought in to gut and redo the alerts in environments when they get set up with spammy email schemes like that.

    If your team needs to know which nodes are down why aren't they just looking at a page in the web console as necessary?  If you insist on sending a status update out hourly you would be better off to do it as a scheduled report of all down nodes, not as an alert. (But I'm still recommending against hourly emails from any type, constantly hitting people over the head with messages from orion makes them start writing rules to dump all your messages into the a folder with 40,000 unread emails)  Bundling everything together in the alert logic just makes life a lot harder than it has to be.  Also moving to an hourly model is pretty strange in that if a node goes down at 9:01 you are comfortable leaving the situation unknown for 59 more minutes?  Most people would feel like having that much delay between an event and notification is not desirable.

    If you are getting spammed by things like a gateway router going down and losing a whole site at once you should be building dependencies to suppress the extra alert messages.

    What is the real root issue here you are trying to address?

  • I wrote my question wrong. Right now we have set up an alert that only alerts when a node has to been down for 45 minutes because a lot of them go down and then regain Connection after like 10-15 min, some even regain Connection after 30 min. So 45 minutes is like the sweet spot. Most of these nodes are units that Controls heating, water, electricity in estates. They can be down for hours because the people changing the settings on these units dont do it that often, as long as we know that its down so we can fix it is the most important thing. And this is nothing that we can really fix so we just have to find a good solution. So what I need is IF a node goes down then the alert will send all nodes that Went down during the last 45 minutes in 1 email. Mostly to prevent spam. Not long ago our internet provider did a change that messed up one of their switches and all 150 nodes that I had in Orion went down so I got 150 alert emails. Also a Daily email report of all nodes that are down would be useful as well. emoticons_happy.png

  • Building out your dependencies would fix that issue with 150 nodes emailing you at once, you would get notified that your edge router or whatever is down and everything you have indicated as being downstream of there gets marked unreachable and doesn't trigger emails. Define and use dependencies in Orion NPM - SolarWinds Worldwide, LLC. Help and Support

    So one of the reasons I don't like combining alerts for more than one object is how do you handle the reset conditions?  Does the alert reset when ANY of the nodes originally listed come back up or when they are all back up or do you just reset the alert automatically every hour and if there are still nodes down it sends another message?

    The last option is the "best" of the options I can think of, and if this is a case where you will be trimming the alert to only apply to a subset of your environment, while still having a more "normal" node down message for the rest of the environment I can see it being a more reasonable scenario.

    So you could structure something like this:

    pastedImage_0.png

    pastedImage_1.png

    pastedImage_2.png

    pastedImage_3.png

    So the trick here is that you enable complex conditions, allow the alert to fire on 1 or more node, it resets on the hour and if there are still any of these environmental devices down for at least 45 min they will all get bundled together.  By wrapping our variables in the <<< >>> it makes it so that variable will get processed once for each of the objects that triggered the alert.

    Ultimately I still wouldn't handle it this way in my own environment but this kind of scheme would probably be tolerable for a relatively small/simple environment where these types of devices aren't down all that often. 

  • Out of curiosity, why not build a report that shows the down nodes - then you can use it as a screen AND you can have that report run and email hourly. No alerts to fuss and muss with.

  • Thanks for the alert mesverrum. I see what you mean by the reset condition problem. I think ill just take your advice and just alert on single node and use dependencies to prevent spam during hardware error/malfunction.

  • Hi Leon,

    I completely forgot about the report feature. I only recently got SolarWinds IPAM. I'll make a Daily report of useful info. Thanks!