We have setup dependencies for over 1500 of our stores with the following network devices: Host, VM, SD-WAN appliance, AP, FW, and Switch.
The dependency typically looks like this:
Parent | Child |
SD-Wan appliance | Switch |
Switch | Switch Children GROUP (members include the host and AP(s)) |
Host | VM |
We have a servicenow integration that creates tickets for our NOC when these devices fail, so when a store goes completely down (loses internet connectivity) and a dependency doesn't work correctly, they get flooded with tickets which is very problematic. When the SD-WAN appliance goes down (the parent of everything), the child nodes don't go into an unreachable state. Sometimes moving all the related store nodes to the same polling engine will kick everything into an unreachable state, but by that time it's too late - the tickets have already been created.
Is there anything i can tweak to fix this?