Failover Interfaces

Hope all is well,

Brief description of what we have setup and what I am trying to do. We have a number of interfaces that show up in all active alerts as interface is down. However, these interfaces are failover and will continue to be in down status as long as the main interface stays up. I would like to clear them from the active alerts page but do not want to interrupt the monitoring of the main interfaces. Sadly, I am having a difficult time figuring the process out.

The out-of-box interface is down alert is what I have enabled currently. We would need to know when an actual interface that is up, triggers to down but I need to separate the failover interface's into its own alert. If that makes sense. 

I have grouped the failover interfaces but haven't quite figured out the next step. 

Any advice or direction would be greatly appreciated.

*edit* My initial idea was to setup a group and add all of the failover interfaces in that group. Then setup an alert for that group specifically to alert if any of those interfaces come up. In theory, the alert page should show the main interface down and the failover group up. This may not be the best practice for this but this was my theory at this point. 

Parents
  • Very common mistake with Orion is to get focused on groups early on.  What you need to do is tag the interfaces with a custom property first, then use that property as the key for all your groups/alerts/dashboards.  For example you could very easily just add a T/F property called "Failover" to interfaces and set it true for all the failover interfaces.  Then you edit the OOTB interface down alert to only include interfaces where "Failover = False" 

    Starting by building groups is going to have you spinning your tires and working a lot harder than focusing on a custom property based approach, and that applies to almost every use case in Orion.  Groups are a pretty niche use case compared to custom properties.

  • Thank you!!

    I believe that did the trick. After going over your response and looking into the alerts and custom properties it all seems to make a little more sense. 

  • Can't agree more with what already said.

    First step is to create those custom properties so you have a clear way to identify which interfaces are part of a Failover Group, and then you could also consider marking it as Active/Standby so you know which interfaces are the ones that should always be Up, and which should always be Down.

    Second part here is to Duplicate & Edit the OOTB Interface Down Alert - Those alerts are not designed to run in a real world system they are just an example to get you running. Create your own version, include a filter for the Failover group and make sure it alerts you using the method that best suits your environment and support teams.

    Consider if you want a separate alert for "Failover Interface is Up" - This would potentially be a duplicate alert if you have already been notified that the Primary is Down, but consider if there is any chance that the Failover could come up without a failover which you need to be aware of?

  • I am starting to see that more and more now. 

    My employer is in the middle of migrating 5 other monitoring systems into SolarWinds and have been using auto discovery and OOTB alerts. I have been recently employed and volunteered to learn and optimize our NPM. 

    I am greatly appreciative of the direction provided I need to head towards. 

Reply
  • I am starting to see that more and more now. 

    My employer is in the middle of migrating 5 other monitoring systems into SolarWinds and have been using auto discovery and OOTB alerts. I have been recently employed and volunteered to learn and optimize our NPM. 

    I am greatly appreciative of the direction provided I need to head towards. 

Children
  • Good Luck, it might be a bit late for you as it sounds like the train already left the station, but if you can think about what other custom properties might help you later on down the line and set them up now and start populating them as you build up the platform it will save you a lot of headaches later.

    Try to keep Alerts agnostic so that you don't create multiple copies of the same alert profile unless you really have to. And don't hesitate to come and ask here if you get stuck ;)

  • Yes sir. The train is well on its way. They did not have anyone before hand to get this on the right track and my experience is from a fresh start install at my previous employment. Which I was able to setup equipment in a lab environment for testing before deployment. This is a full production ISP network that we are working on.  However, I am confident that we will get it sorted out over time. 

    Again, thank you for the assistance and I am excited to continue to work with SolarWinds.