This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Alerting when X number of nodes go down (within X number of minutes).

Hi all,

There are some devices which we don't alert on. These are devices on the end of DSL services which go up and down as expected. However, what I want to do is alert us to a potential major service failure. If we see X number of nodes go down within X number of minutes of each other, there's probably a major issue going on and we should investigate.

I've written the following but I'm not sure if it'll work as I want it to within an Advanced SQL Alert? Using an INNER JOIN because the SELECT part of the statement can't be changed when using an Advanced SQL Alert. This means I can't use COUNT within the SELECT statement, so I'm using HAVING COUNT (at the end) but I'm not sure if that will do what I think it'll do, because it's going to be returning rows of data and not a count. I hope that makes sense!

Here's the SQL:

SELECT Nodes.NodeID AS NetObjectID, Nodes.Caption AS Name

FROM Nodes

INNER JOIN Events

ON Nodes.NodeID = Events.NetObjectID

WHERE

(

  Events.EventType = '1' AND

  (

    Events.EventTime BETWEEN DATEADD(minute, -10, GETUTCDATE()) AND GETUTCDATE()

  )

)

GROUP BY Nodes.NodeID, Nodes.Caption

HAVING COUNT(*) >= 10

Can anyone offer any help with this? Or is this a bit too complicated for the Alert Manager?

Thanks!

  • I should have mentioned what that SQL should do emoticons_happy.png

    Check the Events table and filter based on nodes which have gone in to a Down state (Events.EventType = '1') AND when this happened between now and 10 minutes ago (DATEADD(minute, -10, GETUTCDATE())) but only the total number of rows returned is >= 10 (HAVING COUNT(*) >= 10).

  • Can you use Common Table Expressions??

    I think this will work if you can -

    WITH CTE_AlertCount

    as

    (SELECT COUNT(*) as numAlerts

    FROM Nodes

    INNER JOIN Events

    ON Nodes.NodeID = Events.NetObjectID

    WHERE

    (

      Events.EventType = '1' AND Events.EventTime BETWEEN DATEADD(minute, -10000, GETUTCDATE()) AND GETUTCDATE()

    )

    GROUP BY Nodes.NodeID, Nodes.Caption)

    SELECT numalerts

    from CTE_AlertCount

    WHERE numalerts >= 10

  • I get an error

    (System.Data.SqlClient.SqlException (0x80131904): The multi-part identifier "Nodes.Support" could not be bound.

    The multi-part identifier...)

    when I try to use a limited user, works great as ADMIN

    ##sql code start

    WITH CTE_AlertCount

    as

    (SELECT COUNT(*) as numAlerts, caption, nodeid

    FROM Nodes

    INNER JOIN Events

    ON Nodes.NodeID = Events.NetObjectID

    WHERE

    (

      Events.EventType = '1' AND Events.EventTime BETWEEN DATEADD(day, -30, GETUTCDATE()) AND GETUTCDATE()

    )
    AND
    support IN ('l', 'w')

    GROUP BY Nodes.NodeID, Nodes.Caption)

    SELECT numalerts, caption, nodeid

    from CTE_AlertCount

    WHERE numalerts >= 3
    order by 1 desc