I have been reading up a lot about dependencies and alerts in order to customize the alerting for our environment. We have around 90 sites globally and would like to have some level of intelligent alerting. So what that means is that when a site goes down (all nodes within the group) only the site should alert and not each node within the site (should work if I create a group for the site, and a group for all the nodes within the site, If I understand this correctly). I have also tried the automatic dependency creation to flush out routes, however I am not convinced this is the right way.
I Would then also like a second level dependency, each site has 1 or 2 internet routers, if they go down then none of the child nodes should raise an alert, however this does not mean that the site is down (first level), especially when there are 2 Internet Routers, as one router may be a fail over or simply supply internet to another rack on the site.
It sounds incredibly complex and I am not sure if this is even possible on SolarWinds, might have to do this with the alerts as a second level instead of dependencies.
As I read it, you have 2 scenarios to solve for:
Based on that, I would look at this:
This will suppress all node alerts for the site if a router goes into a 'down' status
Site Offline: There's some challenges with this one
there's multiple other ways, some more complex, some simpler, but more manual to implement; if you have more questions, let's get a discussion going here
My Question today is kind of two fold.
I have set up dependencies for the different sites across the globe, so that we only receive on alert when all the nodes in that group goes down. So I created a root group for the country, then a sub group for the site within the country (some countries have more than 5 sites) and then another sub group containing all the nodes. Thus if the nodes for site 1 is down, then the alert will only raise an alert for that site (not the whole country), I have then in turn (before adding the sub groups into the main groups) created the dependencies, and then added the sub groups into their parents for better display. We had a site down last night, no alert was triggered since it is checking for site DOWN, this site remained in warning state for more than 5 hours. Thus no alerts raised. My Questions are as follows:
Sorry can't display the node names
Please help, this is making me brain dead.
They weren't showing as down, they were all showing as Warning, causing the site to also only show warning, but in fact they were all down, there was a major power failure, and resulted in them all being off for more than 5 hours, yet SolarWinds only showed them as warning.
solarwinds only shows nodes in warning when they are responding intermittently to the ping requests from the poller. If all settings are at default then they should have gone down after 120 seconds of continuous missed pings.
Do any nodes ever show down in your environment? I've never run into a situation where nodes failed to go down if the Orion server can't ping them so you will likely need to do some testing within your environment to pin down if that change you made is a factor.
Yes I have a couple at the moment, this is why this is strange, and it would seem that it is only happening at this one site. I have scheduled some testing for next week to see if I can replicate this issue
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.