I have not found an effective way to handle event aggregation for alerts and I am wondering if anyone else has. I can list a couple of examples that I am having trouble with. In most or all of these cases, the soon to be out of service HP OpenView does aggregate these in one alert or ticket.
We are monitoring both the T1 and Serial interfaces in some cases, and similar related interfaces in others. We have been asked to monitor all of these, but the network group really only wants 1 alert/ticket if it affects both interfaces.
Node down/routing neighbor events:
When a node goes down (often a telecommuter or small office site), SolarWinds triggers on the node and all routes going to the site. The telecommuter sites are often Cisco EIGRP, the multi user sites are usually OSPF or BGP. Here the management IP of the remote site is one IP, and all of the routes to it are other IPs. If the remote site is in SolarWinds, I do see the node name associated with the routes in the upstream device neighbor tables. Most of these sites have 2-3 routes from 2 different routers at the head office going to them, making association/aggregation harder. I can always manually find these after the fact to verify they are related, but our teams still only want to see 1 ticket for this. Keep in mind, I believe we still want alerts if any of the redundant routes goes down by itself.
General routing neighbor events:
I had a couple of BGP outages recently where 5 IP routes on a router went down at the same time, triggering 5 alerts. In this case the downstream devices were all over the place, including downstream monitored devices and ISP connections. No nodes actually went down. The service desk and network group would rather have these in 1 ticket.
Any ideas or thoughts would be appreciated. I can draw flowcharts of logic for most of these, but no idea how to implement with the existing Alert structures.