Level 8

Issues with too many alerts

Got an issue. When there is a brown out or power outage, we got a truck load of alerts which is normally compounded with the extra monitoring. Here is a sample list we get depending on equipment type:

Node down

Power supply down (if only one power supply is plugged into the UPS)

Node up

Node Rebooted

Stack ring down (When not all the switches boot back up at the same time in the stack. You know know long it takes for Cisco devices to warm back up)

Stack ring up

Power supply up

Is there a way to summarize these alerts in a single report? I have been reading up on alerts but all talk of systems and server alerts rather than edge network equipment plugged into UPS less closets.

Level 8

Sorry guys for not getting back with you sooner. Been a crazy month so now I can loop back to the alerts and more.

You do have a point about a lot of the items above.

Is there a way to group the alerts into a summery? Say if a bunch of devices went off line, then wait 2 minutes to group the them into a single e-mail.

Look into groups and dependencies.

I think the first aspect of that is that a few of those don't generally warrant an email in my view.   Most of those events I just display in a dashboard or periodic report.

Node down = potentially urgent, I take those in emails.  Power supply down, I'm assuming is part of a hardware health alert, I just throw all hardware issues into a report that I can spot check daily since the hardware itself has built in redundancy and that's the kind of thing I usually schedule to deal with later.

Node up, sure it's good to know that the problem is gone. 

Node rebooted, I already got told the node was down, I got told it came back, I don't need another email reminding me.   The reboot alert never fires until the node is back online anyway so its really too little too late in many cases. I can review a report on reboot events at the end of the week if it matters.

Stack ring health is another one I put on a dashboard and just keep an eye out when I get around to it.

Since I don't email on the hardware or stack rings I also don't get the reset messages.

2 emails, the rest are line items on a set of reports I can review at my leisure.

- Marc Netterfield, Github
I would agree with this. You have to decide what's really important. You could still receive all of them as emails but filter out the "not so important" ones into a separate folder.

