Hey everyone!
The march of getting production set up continues on. We had to do some patching of our database last night, so I shutdown the Orion app server and left it that way until this morning.
Upon restarting the database and then the app server / polling engine, most of our agent-based Windows and Linux machines reported high packet loss, and displayed as warning/critical. I know this is because the Orion poller wasn't able to do ICMP pings, although it seems like this was a higher failure rate than I was expecting. The agents are all set with default polling intervals of 2 minutes.
I was wondering: How do people avoid getting spammed with alerts when they have an extended maintenance like this?
We don't have any alerts set up right now, but I can see people wanting to alert on high packet loss, and the long polling interval means it will likely fire.I was thinking we could flip the switch on our alerts to turn them off right before the maintenance, but that seems... dangerous, especially if flipping them back on is missed.
There's also the mental image of seeing everything in warning/critical state if they view the web console, but I think I can educate people about that.