I am seeking ideas as to how most people who are using APM have their component alerts set up. Currently, I have an alert in place for when a component is "down", and another alert in place for when a component is either "warning" or "critical". For testing purposes, I have both of the alerts set to "Do not trigger this action until the condition exists for more than 10 minutes". The problem I'm having with the "component down" alert is when we reboot a server for scheduled maintenance, it seems to take between 20-30 minutes before all components report back up properly. If someone on my team fails to "unmanage" the server in question, we get false alerts stating that components are down when they clearly aren't. The alerts don't fire when the server is down, because I do have the "Node Status is not equal to down" statement in the trigger action.
Is this going to be something we have to live with? Are we to leave servers "unmanaged" for 30 minutes after a schedule reboot so we don't receive false alerts about components going down? How is everyone else handling this? Thank you in advance for your suggestions.