Version 3

    I have added 2 addition columns for Alert Prioritising Dashboard which is introduced in my previous document “Alert Prioritising Dashboard (SWQL) for Problematic Nodes (Servers)” (https://thwack.solarwinds.com/docs/DOC-176521)

     

    One column is AlertTime, another one is Acknowledge (Ack). The Ack column is click-able. Right click it and open a new windows to View or Acknowledge an alert.

     

    alert-000.jpg

     

     

    Here are the steps to add the 2 columns:

     

    • Create an alert using "Advanced Alert Manager"

     

     

    Alert Name: “SAM Alert - Nodes with Issues (SQL)”

     

    Alert Description:

    Generate an alert for a node if any condition below is met:

    Node is Down

    OR

    Node is UP but AvgResponseTime > 500

    OR

    Node is UP but CPU Usage > 95%

    OR

    Node is UP but Memory Usage > 95%

    OR

    Node is UP but at least one Volume Usage > 95% (if apply)

    OR

    Node is UP but at least one Hardware component Status is in Warning or worse condition (if apply)

    OR

    Node is UP but at least one application monitor Status is Warning or worse condition (if apply)

     

    alert-001.jpg

     

    Trigger Condition: (SQL)

     

     

    LEFT JOIN

    (

    SELECT NodeID, MAX(VolumePercentUsed) AS vPercentage

    FROM Volumes

    WHERE VolumeType = 'Fixed Disk'

    GROUP BY NodeID

    ) AS h ON h.NodeID = Nodes.NodeID

    LEFT JOIN

    (SELECT c.NodeID AS NodeID,

      MAX(CASE

       WHEN c.Status=1 THEN 0  -- UP

       WHEN c.Status=17 THEN 1  -- - Undefined

       WHEN c.Status=0 THEN 2  -- Unknown

       WHEN c.Status=3 THEN 4  -- Warning

       WHEN c.Status=14 THEN 5  -- Critical

       ELSE NULL

       END

       ) AS HWStatusID

    FROM APM_HardwareCategoryStatus AS c

    JOIN APM_HardwareItem AS e ON e.HardwareCategoryStatusID = c.ID

    Group BY c.NodeID

    ) AS g ON g.NodeID = Nodes.NodeID

    LEFT JOIN

    (SELECT NodeID,

      MAX(CASE

        WHEN ApplicationAvailability='Up' THEN 0  -- UP

        WHEN ApplicationAvailability='Unmanaged' THEN 1  -- Unmanaged

        WHEN ApplicationAvailability='Unknown' THEN 2  -- Unknown

        WHEN ApplicationAvailability='Unreachable' THEN 3  -- Unreachable

        WHEN ApplicationAvailability='Warning' THEN 4  -- Warning

        WHEN ApplicationAvailability='Critical' THEN 5  -- Critical

        WHEN ApplicationAvailability='Down' THEN 6   -- Down

        ELSE NULL

        END

       ) AS aStatusID

    FROM APM_CurrentStatusOfApplication

    GROUP BY NodeID

    ) AS i ON i.NodeID = Nodes.NodeID

    WHERE

    (

        Nodes.Status = 2

        OR

        (Nodes.Status=1 AND Nodes.AvgResponseTime > 500)

        OR

        (Nodes.Status=1 AND Nodes.CPULoad > 95)

        OR

        (Nodes.Status=1 AND Nodes.PercentMemoryUsed > 95)

        OR

        (Nodes.Status=1 AND h.vPercentage > 95.0 AND h.vPercentage < 100.0)

        OR

       (Nodes.Status=1 AND g.HWStatusID > 3)

        OR

        (Nodes.Status=1 AND i.aStatusID > 3)

    )

     

    alert-002.jpg

     

     

    Reset Condition: (SQL)


    LEFT JOIN

    (

    SELECT NodeID, MAX(VolumePercentUsed) AS vPercentage

    FROM Volumes

    WHERE VolumeType = 'Fixed Disk'

    GROUP BY NodeID

    ) AS h ON h.NodeID = Nodes.NodeID

    LEFT JOIN

    (SELECT c.NodeID AS NodeID,

      MAX(CASE

       WHEN c.Status=1 THEN 0  -- UP

       WHEN c.Status=17 THEN 1  -- - Undefined

       WHEN c.Status=0 THEN 2  -- Unknown

       WHEN c.Status=3 THEN 4  -- Warning

       WHEN c.Status=14 THEN 5  -- Critical

       ELSE NULL

       END

       ) AS HWStatusID

    FROM APM_HardwareCategoryStatus AS c

    JOIN APM_HardwareItem AS e ON e.HardwareCategoryStatusID = c.ID

    Group BY c.NodeID

    ) AS g ON g.NodeID = Nodes.NodeID

    LEFT JOIN

    (SELECT NodeID,

      MAX(CASE

        WHEN ApplicationAvailability='Up' THEN 0  -- UP

        WHEN ApplicationAvailability='Unmanaged' THEN 1  -- Unmanaged

        WHEN ApplicationAvailability='Unknown' THEN 2  -- Unknown

        WHEN ApplicationAvailability='Unreachable' THEN 3  -- Unreachable

        WHEN ApplicationAvailability='Warning' THEN 4  -- Warning

        WHEN ApplicationAvailability='Critical' THEN 5  -- Critical

        WHEN ApplicationAvailability='Down' THEN 6   -- Down

        ELSE NULL

        END

       ) AS aStatusID

    FROM APM_CurrentStatusOfApplication

    GROUP BY NodeID

    ) AS i ON i.NodeID = Nodes.NodeID

    WHERE

    (

      Nodes.Status=1

      AND Nodes.AvgResponseTime < 500

      AND Nodes.CPULoad < 95

      AND (Nodes.PercentMemoryUsed < 95 OR Nodes.PercentMemoryUsed IS NULL)

      AND (h.vPercentage < 95.0 OR h.vPercentage IS NULL)

      AND (g.HWStatusID < 3 OR g.HWStatusID IS NULL)

      AND (i.aStatusID < 4 OR i.aStatusID is NULL)

    )


    alert-003.jpg


     

    • Find Alert Definition ID:

     

    If you import the attached alert definition file (SAM_Alert-NodesWithIssuesSQL.AlertDefinition), the AlertDefID is “14d4ba87-7ed3-4b0d-9119-b592b8cfacd8”.

    If you created your own alert, using the method below to find out AlertDefID:

    ·         Create a temporary summary view in Solarwinds Web Console, add a “Custom Query” Resource.

    ·         Edit the “Custom Query”, add the SWQL query below:

    SELECT AlertDefID, Name

    FROM Orion.AlertDefinitions

    WHERE Name = 'SAM Alert - Nodes with Issues (SQL)'

     

    ·         Back to the view, and find out the AlertDefID

    alert-005.jpg

    • Create Alert Dashboard:

     

    Follow the steps in my previous document (https://thwack.solarwinds.com/docs/DOC-176521) to create the alert page

     

    The new SWQL queries are attached.

     

    If you import my Alert, the queries should work out-of-box. If you creates your own alert, please replace AlertDefID
    “14d4ba87-7ed3-4b0d-9119-b592b8cfacd82 with your own one. (2 occurrences in the query)

     

    Thanks