SWQL query for EIGRP neighbor flap

Hi trying to create custom Alert based on SWQL query for EIGRP neighbor flap.
I have found below post but that is using Orion.syslog which we dont have or has been replaced, not sure.

thwack.solarwinds.com/.../detect-flapping-interfaces

I have asked this question also there, but that wasn't answered so trying to create new thread in case that will be more succesful.

Main issue for me is that when I'm using specific time range in query like "DATETIME BETWEEN '2023-11-24 13:00:00.000' AND '2023-11-24 13:20:00.000'" all works fine,
but when I try to used this "DateTime > adddate('minute',-30,getdate())" for past 30min I get timeout error as specified in the thread on that link.

I will appriaciate any suggestions.
Thank you

Parents
  • I am not a coder by any stretch of the imagination, but when I've used adddate (or similar) then the first > has always been the other way around i.e. <. So your line, if I was typing it would read:

    "DateTime < adddate('minute',-30,getdate())"

  • Thanks   for your answer, recently we installed testing version of Solarwinds and we have just few devices there, meaning syslog almost empty and there this query works just fine:

    SELECT TOP 10 Nodes.Uri, Nodes.DisplayName
    FROM Orion.Nodes AS Nodes
    JOIN Orion.OLM.LogEntry Log ON Log.NodeID=Nodes.NodeID
    WHERE Log.Message LIKE '%EIGRP%Tunnel%down%' AND Log.LogType.Type = 'Syslog' AND ToLocal(DateTime) > adddate('minute',-30,getdate())
    GROUP BY Nodes.DisplayName
    HAVING COUNT(*)>=3

    So I believe that reason that it doesnt work on PROD is that there is so much data that server simply cant process that query in time. Hence if someone would come-up with some optimization it would be great.

    When i tested your suggestion turns out that mine > "higher then" is ok and your version with < "lower then" would show all syslog messages EXCEPT those which came in last 30mins Slight smile

  • The problem is that your prod database likely has relatively large tables of logs, and you are asking it to do a string match with several wildcards, which makes it impossible to use the indexes. So it has to scan every single log entry each time it runs. I bet if you could look at your event messages and get rid of the first wildcard in the message it would make a huge difference. 

  • Hey  

    Thanks for sharing your first steps - helped me make some progress on this and I've figured it out from there. I ran into the same issue where the query kept timing out due to the size of the data and the wildcards being used in the query. 

    To workaround having to use wildcards - I leveraged the built in action to tag traps. I created a rule which tagged the trap messages with a custom tag "BGP Peer Down" or whatever you need and then I created a custom SWQL Alert against the Nodes table and join the OLM Log Entry Table and OLM Tag Assignment tables. 

    Now because I'm tagging the traps specifically I no longer need to use the wild cards to parse messages column and instead my query returns in a couple of seconds rather than timing out constantly. 

    **You just need to ensure your rule which tags the trap is accurately tagging the traps with your desired message** 

    My query is like: 

    SELECT TOP 10 Nodes.Uri, Nodes.DisplayName
    FROM Orion.Nodes AS Nodes
    JOIN Orion.OLM.LogEntry Log ON Log.NodeID=Nodes.NodeID
    JOIN Orion.OLM.LogEntryTagAssignment TA on log.LogEntryID = TA.LogEntryID
    WHERE TA.LogEntryTagID = 31
    and DateTime > adddate('minute',-30,getdate())
    GROUP BY Nodes.DisplayName
    HAVING COUNT(*)>=3

    So in theory - if the more than 3 trap with BGP Peer Down occurs on the same node in a 30minute interval I get an Alert 

    Hopefully that helps you and others

Reply
  • Hey  

    Thanks for sharing your first steps - helped me make some progress on this and I've figured it out from there. I ran into the same issue where the query kept timing out due to the size of the data and the wildcards being used in the query. 

    To workaround having to use wildcards - I leveraged the built in action to tag traps. I created a rule which tagged the trap messages with a custom tag "BGP Peer Down" or whatever you need and then I created a custom SWQL Alert against the Nodes table and join the OLM Log Entry Table and OLM Tag Assignment tables. 

    Now because I'm tagging the traps specifically I no longer need to use the wild cards to parse messages column and instead my query returns in a couple of seconds rather than timing out constantly. 

    **You just need to ensure your rule which tags the trap is accurately tagging the traps with your desired message** 

    My query is like: 

    SELECT TOP 10 Nodes.Uri, Nodes.DisplayName
    FROM Orion.Nodes AS Nodes
    JOIN Orion.OLM.LogEntry Log ON Log.NodeID=Nodes.NodeID
    JOIN Orion.OLM.LogEntryTagAssignment TA on log.LogEntryID = TA.LogEntryID
    WHERE TA.LogEntryTagID = 31
    and DateTime > adddate('minute',-30,getdate())
    GROUP BY Nodes.DisplayName
    HAVING COUNT(*)>=3

    So in theory - if the more than 3 trap with BGP Peer Down occurs on the same node in a 30minute interval I get an Alert 

    Hopefully that helps you and others

Children
No Data