This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

report node's downtime?

Has anyone figured out how to report a node's DOWN TIME?

 

I have a requirement that we need to include the length of time a node has been "down" with the email alerts we send out - or, probably better, when we send out the "resolved" notice, include downtime then. I have not been able to get anywhere on this one..

  • The events for nodes are coded in the events table.  A node down is an event of type 5.  If you know which particular node it is, you could query the events table for the node you want. You can create SQL variables to insert into alert messages but I'd have to play with this one to figure it out. The variable would look something like:

    ${SQL:select top 1 datediff(e.eventtime,getdate()) from events e where e.nodeid = '${Node.NodeID}'}

  • Thanks, but your talking to a  non-sql guy.  I have a basic understanding of sql, but looks somewhat greek to me.

    if I throw this into a test alert, It comes back with "The datediff function requires 3 arguments". 

  • Try this: update mrxinu's query so that you specify the number of minutes it was down.

    ${SQL:select top 1 datediff(minute,e.eventtime,getdate()) from events e where e.nodeid = '${Node.NodeID}'}

    You see the change I inserted in bold? You can change that to anything from a year down to a nanosecond. Hopefully, that works for you.

  • Thanks, but I got a different error this time - "MACRO SQL ERROR - Invalid column name 'NodeID'"

    I started poking around the studio, and I think I have it figured out, for the most part.. Here is the queury I ran:

    ${SQL:select top 1 datediff(minute,e.eventtime,getdate()) from events e where e.networknode = '${Node.NodeID}'}

    The result was  "Down time :21718" - so, now I have to determine what that 21718 really means.. is that truley minutes? seconds? not sure.  I forced a node down by changing the ip in Orion, and its been "down" for about 20 minutes, so I think we are close.

    Thanks for the tips and examples!

    EDIT: I recruitied one of our SQL guru's in the office. We changd the "top 1" to "distinct" and it works PERFECTLY!  Here is the query I used.

    ${NodeName} has been down for ${SQL:select distinct datediff(minute,e.eventtime,getdate()) from events e where e.networknode = '${Node.NodeID}'} minutes.

  • FormerMember
    0 FormerMember in reply to dclick

    what SQL code do I need to use to put this into an advanced SQL report? I have a requirement of giving a report to my CIO every monday morning showing the total downtime for each node over the past week...

  • I gave up on this. I couldnt make it work, and since im not a DBA, I dont have time to learn how to make something like this work, when in my opinion, it should  have been part of the package from the get go.

    maybe the next Service pack will have it, but this is really a thorn in my side.   If anyone has taken this on, and has a RELIABLE way to show downtime IN the email alerts, specificly on the "Alert resolved" emails, that would be awsome.

  • Not sure if this is still valid issue, but maybe this will help some (for NPM Report Writer):

    SELECT
        StartTime.EventTime,
        DATEDIFF(d, StartTime.EventTime,
        (SELECT TOP 1
            EventTime
            FROM Events AS Endtime
            WHERE EndTime.EventTime > StartTime.EventTime AND EndTime.EventType = 5
                AND EndTime.NetObjectType = 'N'
                AND EndTime.NetworkNode = StartTime.NetworkNode
            ORDER BY EndTime.EventTime)) AS DaysDuration,
        Nodes.Caption, 
        StartTime.Message


    FROM Events StartTime INNER JOIN Nodes ON StartTime.NetworkNode = Nodes.NodeID
    WHERE (StartTime.EventType = 1) AND (StartTime.NetObjectType = 'N') AND
    (Nodes.Status = '2' OR Nodes.Status = '0')
    ORDER BY DaysDuration DESC, Nodes.Caption ASC

  • So I tried implementing the verified Answer and would only get my email alert to show "0" minutes down.

     

    Here is the code that I am using:

    ${NodeName} has been down for ${SQL:select distinct datediff(minute,e.EventTime,getdate()) from events e where e.NetworkNode = '${Nodes.NodeID}'} minutes.

     

    I had to change the last part of the SQL query to use "Nodes" as there is no "Node" table in NPM.

     

    My problem still remains though that I keep getting my email alert showing that my device has been down for "0" minutes.

     

    Can someone please help me figure this out so that I can get the time down for my device to show up properly?

  • That SQL query looks to be correct. The only thing that might possibly be wrong is the single quotes around the ${Nodes.NodeID}. Get rid of them so that it should look like the following:

    ${SQL:select distinct datediff(minute,e.EventTime,getdate()) from events e where e.NetworkNode = ${Nodes.NodeID}}

    Let us know if this works for you.

  • qle, I am still having the same issue.

    I continue to receive my email alerts showing that the switch/device has been down for 0 minutes.

    My test lasted over 10 minutes and it still doesn't show me how long the device was down for.

     

    What could be stopping me from getting how long the switch/device has been down for?

    Your help is greatly appreciated.

    Thanks