Effective Monitoring and Alerting on Running Copies of the Processes

Version 2

    Hi guys,

     

    This is a workaround which will address inability to natively set a threshold for the Process Monitor Component to effectively Alert on Running Copies. Well, you "kind of" can do it now already, as described here Re: Running Copies - Trigger alert when number of running copies is greater than 1 but this will not work for different components with different threshold requirements. You will most likely end up with multiple alerts for every case... with mess other words

     

     

    Some ideas were logged here - give a thumb up:

    https://thwack.solarwinds.com/ideas/4359

    https://thwack.solarwinds.com/ideas/1471

     

    SO, HERE IS WORKAROUND:

     

    LET'S SETUP ALERT FIRST

     

    (1)

    Create New Advance Alert > Trigger Condition > Property to Monitor: Custom SQL Alert > Trigger Query: APM: Component

     

    (2)

    Trigger condition - (remember to change database name):

     

    LEFT JOIN (SELECT APM_ComponentStatus_Detail.ComponentID, MAX(APM_ComponentStatus_Detail.[TimeStamp]) as LastCheck
      FROM SolarWinds.dbo.APM_ComponentStatus_Detail
      GROUP BY APM_ComponentStatus_Detail.ComponentID) c1 on APM_AlertsAndReportsData.ComponentID = c1.ComponentID
    INNER JOIN Solarwinds.dbo.APM_ComponentStatus_Detail csd ON csd.ComponentID = APM_AlertsAndReportsData.ComponentID
    INNER JOIN Solarwinds.dbo.APM_ProcessEvidence_Detail ped ON ped.ComponentStatusID = csd.ID
    INNER JOIN Solarwinds.dbo.APM_ComponentSetting cs ON (cs.ComponentID = APM_AlertsAndReportsData.ComponentID AND cs.[Key] = '__UserNotes')
    WHERE
      cs.Value LIKE 'RunningCopiesThreshold:%' AND --value mast contain 3 numbers, with leading/trailing zeros if needed (ex. RunningCopiesThreshold:010)
      c1.LastCheck = csd.[TimeStamp]
      GROUP BY APM_AlertsAndReportsData.ComponentID,APM_AlertsAndReportsData.ComponentName
        HAVING
          COUNT(*) > CAST(SUBSTRING(MAX(cs.Value),24,3) AS INT)
    
    

     

    (3)

    Trigger reset (remember to change database name):

     

    WHERE APM_AlertsAndReportsData.ComponentID NOT IN 
      (
        SELECT DISTINCT APM_AlertsAndReportsData.ComponentID
        FROM SolarWinds.dbo.APM_AlertsAndReportsData
    ---------------------------------------------
    LEFT JOIN (SELECT APM_ComponentStatus_Detail.ComponentID, MAX(APM_ComponentStatus_Detail.[TimeStamp]) as LastCheck
      FROM SolarWinds.dbo.APM_ComponentStatus_Detail
      GROUP BY APM_ComponentStatus_Detail.ComponentID) c1 on APM_AlertsAndReportsData.ComponentID = c1.ComponentID
    INNER JOIN Solarwinds.dbo.APM_ComponentStatus_Detail csd ON csd.ComponentID = APM_AlertsAndReportsData.ComponentID
    INNER JOIN Solarwinds.dbo.APM_ProcessEvidence_Detail ped ON ped.ComponentStatusID = csd.ID
    INNER JOIN Solarwinds.dbo.APM_ComponentSetting cs ON (cs.ComponentID = APM_AlertsAndReportsData.ComponentID AND cs.[Key] = '__UserNotes')
    WHERE
      cs.Value LIKE 'RunningCopiesThreshold:%' AND --value mast contain 3 numbers, with leading/trailing zeros if needed (ex. RunningCopiesThreshold:010)
      c1.LastCheck = csd.[TimeStamp]
      GROUP BY APM_AlertsAndReportsData.ComponentID,APM_AlertsAndReportsData.ComponentName
        HAVING
          COUNT(*) > CAST(SUBSTRING(MAX(cs.Value),24,3) AS INT)
    )
    
    

     

     

    SETTING UP THRESHOLDS

     

    From now on, all you need to do is the following:

     

    In order to set a threshold you can do this per component. Simply add the following string to the beginning of the User Notes

     

     

     

    Have fun!

     

    Alex