Open for Voting

Auto-Unmanage Based on 'No Response' Thresholds

I wasn't sure whether to put this under SAM or NPM, but since my biggest pain points are SAM application monitors right now, SAM wins the prize.

This is connected to my request.

When an application component becomes unresponsive (and there are LOTS of reasons why a SAM component monitor would stop working -- SSH authentication failures, WMI credential issues, SNMP service stopped, etc. etc.) there is no system outside of unmanaging the application monitor or disabling the component that would allow you to stop component polling.  See my feature request for the details on why this can be problematic, but we're limited on the things we can do today to mitigate it.  There is no alert action to disable SAM monitoring or to make the node unmanaged (unless you have some SDK chops and add a custom action)

I propose that we be given a mechanism to auto-unmanage SAM application monitors (heck, even nodes themselves which would solve the SAM application monitor issue) when a node has been down for X period of time.  Sure there is some risk if this were the default behaviour, but since it will not be and will have to be enabled as an option then people like me can decide to accept any risks to the minimize the impact to the maximum portion of our environment.

  • I would like to see an 'auto unmanage' action within an AppInsightSQL for when mysterious databases (lacking monitoring credentials ofc) appears...  many times it's our support group just making a copy of the production customer DB so they can try something, but it still costs polling resources to create table entries for it and begin failed attempts to collect data.