Hey all,
Got quite the workload in front of me and I want to get some advice on where to start before......well getting started.
I monitor around some 2500+ nodes in SolarWinds now and its ever growing. Just the Servers, Switches, and Firewalls plus a few random here and there objects like a UPS, iDRAC, etc
Most if not all of my sites contain the following flow chart
- Firewall (typically a Meraki MX - I don't monitor the circuit/demarcs YET)
- Switches (sit behind firewall) (Meraki as well)
- Servers, UPS, iDRAC, etc (sit behind switches)
I need to go ahead and take the time to build out dependencies, which can take a very long time. I have dynamic groups that add everything in the right place on discoveries based on Node Name OR VLAN - just depends. So far I have just dealt with the alert storms that pop up here and there when a site goes offline BUT these alerts will be put into a PROD environment where they will make tickets and I don't need 15 tickets for when a site goes down, just the 1 or 2 would be more than enough.
Typically that looks as followed:
- Group Name (Parent Group)
- Firewall
- Access Points
- Servers (I throw the iDRACS/IPMI in here)
- Switches
- Other (UPS and etc)
As I set up the dependencies, what should be the best practice? Should the Firewall (monitored only through ICMP only) be the Parent and everything under that the children? I believe I can only set one child per parent which would mean I would need to set a dependency per group. For example:
Dependency1: Firewall (Parent) > AccessPoints (Child)
Dependency2: Firewall (Parent) > Servers (Child)
Which would take forever......or should I just set them up like followed:
Dependency: Switches (Parent) > Servers (Child)
That way I don't get an alert storm when a switch goes offline for all the following objects under it. Every site we have has some kind of stacked switch.
IF SWITCH A goes DOWN
SERVER A is connected to SWITCH A
BOTH SWITCH A AND SERVER A DOWN
IF SWITCH B goes DOWN
SERVER A is connected to SWITCH A
ONLY SWITCH B is DOWN and SERVER A is UP
I still want that alert on SERVER A
Best Practices and advice would be great before I start on this long endeavor
Thank you!