Up Alerts

Hello Guys,

I'm trying setup up alerts based on previous group status like "down" the problem we have some groups generating Up alerts without being actually down and flood with false alerts, is happen sites located far away so communication between sites is not stable we trying different trigger algorithms to overcome these problems without any seeing succuss, i play around with root conditions double root, trigger time etc, non of them help to overcome this problem - or is not working or it flooding no middle.

Is someone from community has faced with same problem, if does how you guys resolve it?

Thank you!

  • You have several things going on. Group that have mixed status, Unreliable/High Latency WAN connections, and Alerting Chaos. You have not mentioned using dependencies with the groups. That will suppress all group members from alerting. This stops a flood, but if they are technically up - not the best. BUT - if the connectivity is problematic, do those up nodes still function properly if they cannot communicate across your network properly? Rather - are there reasons like distributed applications or databases that need reliable communication? Dealing with those wan links, you may be able to consider a separate alert, or even alert condition (complex alert) based on latency or dropped packets.  Last, the alert chaos - you can look at complex alerts. Such as multiple trigger  conditions or looking at extended down time conditions. The last method works fairly well with web site uptime checks. 

    One method I have used - is to ask if the remote site can be considered online if the WAN connection is down/unreachable? This is where a group dependency can illustrate that the site may be online, but not connected. The remote status of anything at that site cannot be verified reliably during this time. You can also look at using agents or a remote poller to resolve some of the issues at these sites, but the problem of the WAN link itself will still be your biggest hurdle. I would look at reports on these connections over time to see if management is able to assist with improving the connections, if that is even possible. If you have the data, use it illustrate the problem.

  • Thank you for your detailed reply! i need to clarify couple points here: 1. The group never really goes down - it does but only for couple packets - is not enough to generate down alert, but it is enough to notify the group currently is up, ignore the previous conditions and generate alert  2. The mentioned group is relevant only if all group members in down status. 3 The previous illustrated condition supressed the flood but in same time it ignore the previous group status and simply not generate any "UP" alert 4. I already play with "extended time" it not really matter it simply ignore the previous state and still flooding or not working at all. Please take a look on group alert log, as you can see there is a lot of "Up" Alerts without single "Down"  

  • here you can see the initial setup of "down" alert for same group

  • thank you for reply, can you please expand? 

    did you ever setup "up" alert in solarwinds?

  • If I understand correctly you have two separate alerts, one for down and one for up. Is there a reason for doing that instead of creating one down alert and then using the reset action of that as the "up alert"?

  • Thank you! 

    Is a brilliant point! these alerts trigger another automation and it didn't work out last time we test it - i will try another automation flow through mail notification this time.

    Again thank you for point this out! 

  • Ditto. I just caught up on this post and agree with  .   

  • Hmm sounds like an alert on group mixed status might be where you are heading. If you want suppression, the simplest option is to look for a node that always triggers during these conditions. Then set that up as Parent for the group dependency. I have a feeling that this situation might not exist as it may be random nodes each time. 

    Too Long/Dont Read Section: There are a couple of things that could be done with custom properties too. If you setup a property like "Alert_Me' with a yes/no value, then add that value to the scope in your normal down alerts. The group alert would need a trigger action to set that custom property to NO to suppress alerts. Two things are assumed - relevant nodes have the property AND the group alert triggers before the flood. You would need a reset action to set things back to normal then. Too make sure such a property is always set ahead of time, maybe an alert that triggers on network issues (latency or drops) and is run every minute with zero wait on the condition. This is ugly stuff, but it should give you some additional ideas. 

  • The Trigger and the Object is conflicting and redundant.  Why would you have the trigger as status as up while on the Object it needs to be either Unknown, down, etc,  essentially not equal to up(which is another thing, instead of writing al of those status, you can simply put in not equal to Up).



    If the Group Status is equal to Up, why would be you expect Group status be different?  That does not make sense to me. Why would you expect it to be unknown, Down, Etc.

    I think it would be best to take the advise of the other people here, like utilize Custom Property, mixed Group Status, Reset Action of your Original Alert.  You just need to be creative on how to implement what you need.

    Honestly, you can learn a great deal by Subscribing to Solarwinds Academy.