This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Manage Node's Response Time Thresholds Better By Accounting For WAN Latencies



How do you deal with nodes response times across multiple worldwide sites?


OK, I know, I know... NPM platform does give you a way of overriding global thresholds per node through "Edit Node" properties.


However, think about this:

- what if you have a lot of devices worldwide? 100s and 1000s of them?

- how do you keep track of 100s and 1000s of overrides that you have set in the past?

- what sort of overhead it creates to manage these overrides on a global scale?

- what if you need to reduce/increase latency threshold for all devices at the particulate location only?

- how do you sleep at night knowing that there might be a device (or two...) which was incorrectly overridden (not by you of course) with much higher latency and you potentially have issues without even knowing about it?

- if for any reason you had a massive latency spike and upon the next poll it has settled - your average is still going to be skewed anyway. By default most of the alerts and resources are based on this average. So, you might be getting some false-positives.


We will create a table in auxiliary database which will hold information about Latency Offsets for each site. Then we will use this table to calculate Latency for each device based on location that this device was assigned to. Keep in mind that this is exactly the same custom property which we have used in the Automatic Nodes Placement on the Worldmap solution previously. So, with just a few clicks (setting SiteID custom property) you can eventually control not only your nodes on map but also your response time worldwide - two rabbits with one shot. What is really beautiful about all this is that all your latency offset values and thresholds are stored centrally, in one table.

End result:

Here is the only thing which is needed to be done by Engineering Team - set Site ID custom property. That's it:


Here is what you will get:

Main resource for your Summary NOC page:

If everything is OK - it will show nothing


However, as long as there are some issue - they will show up here:



Few additional informational resources for the nodes's summary page so that you always have these location details on hand when you land on any node:




Create few tables in auxiliary database



* Please consult DBA for advise on column types and any other constraints. I am not DBA and I have done this to the best of my knowledge. This will need to be improved later. For now - it works as it is


Populate tables with your site details





* Notice that LocationID will be joining link between these two tables.


Here is a golden SQL for your reporting - remember to change database name



,'Warning.gif' AS 'ICON-STAT'

,Nodes.Caption AS 'NODE' --/Orion/View.aspx?View=NodeDetails&NetObject=N:${NodeID}

,Nodes.IP_Address AS 'IP' --/Orion/View.aspx?View=NodeDetails&NetObject=N:${NodeID}

,CONVERT(NVARCHAR(50), r1.AvgResponseTime) + ' ms' AS 'PING'

,CONVERT(NVARCHAR(50), r1.PercentLoss) + ' %' AS 'LOSS'

FROM Solarwinds.dbo.Nodes


--join global thresholds

INNER JOIN Solarwinds.dbo.NodesThresholds nt_resp WITH (NOLOCK) ON (Nodes.NodeID = nt_resp.InstanceId AND nt_resp.Name = 'Nodes.Stats.ResponseTime')

INNER JOIN Solarwinds.dbo.NodesThresholds nt_loss WITH (NOLOCK) ON (Nodes.NodeID = nt_loss.InstanceId AND nt_loss.Name = 'Nodes.Stats.PercentLoss')

--join last recorded response/loss per node


  SELECT NodeID, MAX([DateTime]) AS max_datetime

  FROM [Solarwinds].[dbo].[ResponseTime_Detail]

  GROUP BY NodeID) MaxDates ON MaxDates.NodeID = Nodes.NodeID

LEFT JOIN [Solarwinds].[dbo].[ResponseTime_Detail] r1 ON (r1.[DateTime] = MaxDates.max_datetime AND r1.NodeID = MaxDates.NodeID)

--join site and latency offset

LEFT JOIN SolarWindsAux.dbo.Sites s WITH (NOLOCK) ON s.SiteID = Nodes.n_site_id

LEFT JOIN SolarWindsAux.dbo.LatencyOffset lo WITH (NOLOCK) ON (Nodes.EngineID = lo.EngineID AND s.LocationID = lo.LocationID)



  Nodes.UnManaged = 0 AND --node is not unmanaged

  Nodes.Status <> '2' AND --node is not down



      nt_loss.CurrentValue > nt_loss.Level1Value AND --node has high AVG packet loss value

      ISNULL(r1.PercentLoss,0) > nt_loss.Level1Value --node has high LAST packet loss value

    ) OR


      nt_resp.CurrentValue - ISNULL(lo.LatencyOffset,0) > nt_resp.Level1Value AND --node has high AVG responce time (minus Latency offset, based on location, if specified)

      ISNULL(r1.AvgResponseTime,0) - ISNULL(lo.LatencyOffset,0) > nt_resp.Level1Value --node has high LAST responce time (minus Latency offset, based on location, if specified)



* Notice that we not only check the average here but also check the last recorded value and only trigger an alert if last value exceed threshold as well


Download the above SQL as a ready-made report and import into Orion Report-Writer.

Add this report as an on page resource on your NOC page



Bonus. Now, let's create few more reports for the node summary page:

SQL source for Location


  Sites.SiteID AS 'SITE'

  ,Sites.Address AS 'ADDRESS'

  ,Sites.Description AS 'DESCRIPTION'

FROM SolarwindsAux.dbo.Sites

INNER JOIN Solarwinds.dbo.Nodes n ON n.n_site_id = Sites.SiteID

WHERE n.Caption = '${Caption}'

SQL source for Latency Offset


  Nodes.EngineID AS 'Poller ID'

  ,e.ServerName AS 'Poller Name'

  ,s.LocationID AS 'Node Location'

  ,lo.LatencyOffset AS 'Latency Offset'

FROM Solarwinds.dbo.Nodes

LEFT JOIN SolarWindsAux.dbo.Sites s ON Nodes.n_site_id = s.SiteID

LEFT JOIN SolarWindsAux.dbo.LatencyOffset lo ON lo.LocationID = s.LocationID

INNER JOIN Solarwinds.dbo.Engines e ON Nodes.EngineID = e.EngineID

WHERE Nodes.Caption = '${Caption}'

* you will find both reports attached as well.

Same as above - just add them as a resources on your node summary page


That's it! Well done! emoticons_happy.png

Further Tips & Advice:

* This solution fits very well with Automatic Nodes Placement on the Worldmap as both will use same custom property to define location - SiteID

* You can have your locations based on countries, or you can go more tight and identify locations based on areas, towns, states, etc

To Your Monitoring Success,

Alex Soul