Good Morning All,
We recently installed and configured SolarWinds NPM with NTA on our transport network. The network we are using is a Cisco Carrier Ethernet model based on ASR 920 core routers. The Network is setup to run five layer 2 VLANS (Bridge Domains in Carrier Ethernet). The top Bridge Domain is for management, then we have 4 other Bridge Domains to transport customer traffic between work sites. The network seems to work very well, but we have one large problem, it seems.
Issue: If a node IP fails to respond to a poll NPM sends the ping request to the lost node to try and find it. For some reason this is causing a broadcast storm on the network where wireshark is seeing tens of thousands of ICMP packets circulating on the management VLAN. The storm is bad enough to cause a DOS condition for about 10-20 seconds when the NPM server sends the ping out.
We have been grappling trying to resolve why the ping would storm the network, but cannot nail it down.
Testing Conducted:
1. Tried to localize the problem by shutting a port on polled nodes at various places in the network topology. SAME RESULT
2. Tried to determine if the storm is at layer 2 or 3 by implementing MSTP on all nodes and inspecting all of the spanning tree properties. NO ISSUE FOUND.
3. Tried to replicate the problem by manually pinging the shutdown node address from the SolarWinds Server Windows Command Line. The ping acts normal where the ARP can't resolve the address and the ping fails. NO STORM OCCURRED.
Has anyone seen this issue before, or maybe have some pointers on how we can nail this down?
At this point we are considering moving the entire monitoring system off SolarWinds and over to Cisco Ethernet OAM.
Example Router Config is attached.
Thanks!