This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Manage Node's Response Time Thresholds Better By Accounting For WAN Latencies

< BACK TO TO THE MAGIC OF SQL SCRIPTS SERIES LIST

Challenge:

How do you deal with nodes response times across multiple worldwide sites?

011.jpg

OK, I know, I know... NPM platform does give you a way of overriding global thresholds per node through "Edit Node" properties.

012.JPG

However, think about this:

- what if you have a lot of devices worldwide? 100s and 1000s of them?

- how do you keep track of 100s and 1000s of overrides that you have set in the past?

- what sort of overhead it creates to manage these overrides on a global scale?

- what if you need to reduce/increase latency threshold for all devices at the particulate location only?

- how do you sleep at night knowing that there might be a device (or two...) which was incorrectly overridden (not by you of course) with much higher latency and you potentially have issues without even knowing about it?

- if for any reason you had a massive latency spike and upon the next poll it has settled - your average is still going to be skewed anyway. By default most of the alerts and resources are based on this average. So, you might be getting some false-positives.

Solution:

We will create a table in auxiliary database which will hold information about Latency Offsets for each site. Then we will use this table to calculate Latency for each device based on location that this device was assigned to. Keep in mind that this is exactly the same custom property which we have used in the Automatic Nodes Placement on the Worldmap solution previously. So, with just a few clicks (setting SiteID custom property) you can eventually control not only your nodes on map but also your response time worldwide - two rabbits with one shot. What is really beautiful about all this is that all your latency offset values and thresholds are stored centrally, in one table.

End result:

Here is the only thing which is needed to be done by Engineering Team - set Site ID custom property. That's it:

SnipImage.JPG

Here is what you will get:


Main resource for your Summary NOC page:

If everything is OK - it will show nothing

017.JPG

However, as long as there are some issue - they will show up here:

014.JPG

Bonus


Few additional informational resources for the nodes's summary page so that you always have these location details on hand when you land on any node:

016.JPG

Step-By-Step:


(1)

Create few tables in auxiliary database

0124.PNG

0126.PNG

* Please consult DBA for advise on column types and any other constraints. I am not DBA and I have done this to the best of my knowledge. This will need to be improved later. For now - it works as it is


(2)

Populate tables with your site details

SITES TABLE

0123.PNG

LATENCY OFFSET TABLE

0125.PNG


* Notice that LocationID will be joining link between these two tables.


(3)

Here is a golden SQL for your reporting - remember to change database name

SELECT

  Nodes.NodeID

,'Warning.gif' AS 'ICON-STAT'

,Nodes.Caption AS 'NODE' --/Orion/View.aspx?View=NodeDetails&NetObject=N:${NodeID}

,Nodes.IP_Address AS 'IP' --/Orion/View.aspx?View=NodeDetails&NetObject=N:${NodeID}

,CONVERT(NVARCHAR(50), r1.AvgResponseTime) + ' ms' AS 'PING'

,CONVERT(NVARCHAR(50), r1.PercentLoss) + ' %' AS 'LOSS'

FROM Solarwinds.dbo.Nodes

---------------------------------------------

--join global thresholds

INNER JOIN Solarwinds.dbo.NodesThresholds nt_resp WITH (NOLOCK) ON (Nodes.NodeID = nt_resp.InstanceId AND nt_resp.Name = 'Nodes.Stats.ResponseTime')

INNER JOIN Solarwinds.dbo.NodesThresholds nt_loss WITH (NOLOCK) ON (Nodes.NodeID = nt_loss.InstanceId AND nt_loss.Name = 'Nodes.Stats.PercentLoss')

--join last recorded response/loss per node

LEFT JOIN (

  SELECT NodeID, MAX([DateTime]) AS max_datetime

  FROM [Solarwinds].[dbo].[ResponseTime_Detail]

  GROUP BY NodeID) MaxDates ON MaxDates.NodeID = Nodes.NodeID

LEFT JOIN [Solarwinds].[dbo].[ResponseTime_Detail] r1 ON (r1.[DateTime] = MaxDates.max_datetime AND r1.NodeID = MaxDates.NodeID)

--join site and latency offset

LEFT JOIN SolarWindsAux.dbo.Sites s WITH (NOLOCK) ON s.SiteID = Nodes.n_site_id

LEFT JOIN SolarWindsAux.dbo.LatencyOffset lo WITH (NOLOCK) ON (Nodes.EngineID = lo.EngineID AND s.LocationID = lo.LocationID)

---------------------------------------------

WHERE

  Nodes.UnManaged = 0 AND --node is not unmanaged

  Nodes.Status <> '2' AND --node is not down

  (

    (

      nt_loss.CurrentValue > nt_loss.Level1Value AND --node has high AVG packet loss value

      ISNULL(r1.PercentLoss,0) > nt_loss.Level1Value --node has high LAST packet loss value

    ) OR

    (

      nt_resp.CurrentValue - ISNULL(lo.LatencyOffset,0) > nt_resp.Level1Value AND --node has high AVG responce time (minus Latency offset, based on location, if specified)

      ISNULL(r1.AvgResponseTime,0) - ISNULL(lo.LatencyOffset,0) > nt_resp.Level1Value --node has high LAST responce time (minus Latency offset, based on location, if specified)

    )

  )

* Notice that we not only check the average here but also check the last recorded value and only trigger an alert if last value exceed threshold as well

(4)

Download the above SQL as a ready-made report and import into Orion Report-Writer.


Add this report as an on page resource on your NOC page

report.JPG


(5)

Bonus. Now, let's create few more reports for the node summary page:


SQL source for Location

SELECT

  Sites.SiteID AS 'SITE'

  ,Sites.Address AS 'ADDRESS'

  ,Sites.Description AS 'DESCRIPTION'

FROM SolarwindsAux.dbo.Sites

INNER JOIN Solarwinds.dbo.Nodes n ON n.n_site_id = Sites.SiteID

WHERE n.Caption = '${Caption}'

SQL source for Latency Offset


SELECT

  Nodes.EngineID AS 'Poller ID'

  ,e.ServerName AS 'Poller Name'

  ,s.LocationID AS 'Node Location'

  ,lo.LatencyOffset AS 'Latency Offset'

FROM Solarwinds.dbo.Nodes

LEFT JOIN SolarWindsAux.dbo.Sites s ON Nodes.n_site_id = s.SiteID

LEFT JOIN SolarWindsAux.dbo.LatencyOffset lo ON lo.LocationID = s.LocationID

INNER JOIN Solarwinds.dbo.Engines e ON Nodes.EngineID = e.EngineID

WHERE Nodes.Caption = '${Caption}'



* you will find both reports attached as well.


Same as above - just add them as a resources on your node summary page


(*)

That's it! Well done! emoticons_happy.png

Further Tips & Advice:


* This solution fits very well with Automatic Nodes Placement on the Worldmap as both will use same custom property to define location - SiteID

* You can have your locations based on countries, or you can go more tight and identify locations based on areas, towns, states, etc


To Your Monitoring Success,

Alex Soul

www.pixace.com


--

attachments.zip