< BACK TO TO THE MAGIC OF SQL SCRIPTS SERIES LIST
Challenge:
How do you deal with nodes response times across multiple worldwide sites?
OK, I know, I know... NPM platform does give you a way of overriding global thresholds per node through "Edit Node" properties.
However, think about this:
- what if you have a lot of devices worldwide? 100s and 1000s of them?
- how do you keep track of 100s and 1000s of overrides that you have set in the past?
- what sort of overhead it creates to manage these overrides on a global scale?
- what if you need to reduce/increase latency threshold for all devices at the particulate location only?
- how do you sleep at night knowing that there might be a device (or two...) which was incorrectly overridden (not by you of course) with much higher latency and you potentially have issues without even knowing about it?
- if for any reason you had a massive latency spike and upon the next poll it has settled - your average is still going to be skewed anyway. By default most of the alerts and resources are based on this average. So, you might be getting some false-positives.
Solution:
We will create a table in auxiliary database which will hold information about Latency Offsets for each site. Then we will use this table to calculate Latency for each device based on location that this device was assigned to. Keep in mind that this is exactly the same custom property which we have used in the Automatic Nodes Placement on the Worldmap solution previously. So, with just a few clicks (setting SiteID custom property) you can eventually control not only your nodes on map but also your response time worldwide - two rabbits with one shot. What is really beautiful about all this is that all your latency offset values and thresholds are stored centrally, in one table.
End result:
Here is the only thing which is needed to be done by Engineering Team - set Site ID custom property. That's it:
Here is what you will get:
Main resource for your Summary NOC page:
If everything is OK - it will show nothing
However, as long as there are some issue - they will show up here:
Bonus
Few additional informational resources for the nodes's summary page so that you always have these location details on hand when you land on any node:
Step-By-Step:
(1)
Create few tables in auxiliary database
* Please consult DBA for advise on column types and any other constraints. I am not DBA and I have done this to the best of my knowledge. This will need to be improved later. For now - it works as it is
(2)
Populate tables with your site details
SITES TABLE
LATENCY OFFSET TABLE
* Notice that LocationID will be joining link between these two tables.
(3)
Here is a golden SQL for your reporting - remember to change database name
SELECT
Nodes.NodeID
,'Warning.gif' AS 'ICON-STAT'
,Nodes.Caption AS 'NODE' --/Orion/View.aspx?View=NodeDetails&NetObject=N:${NodeID}
,Nodes.IP_Address AS 'IP' --/Orion/View.aspx?View=NodeDetails&NetObject=N:${NodeID}
,CONVERT(NVARCHAR(50), r1.AvgResponseTime) + ' ms' AS 'PING'
,CONVERT(NVARCHAR(50), r1.PercentLoss) + ' %' AS 'LOSS'
FROM Solarwinds.dbo.Nodes
---------------------------------------------
--join global thresholds
INNER JOIN Solarwinds.dbo.NodesThresholds nt_resp WITH (NOLOCK) ON (Nodes.NodeID = nt_resp.InstanceId AND nt_resp.Name = 'Nodes.Stats.ResponseTime')
INNER JOIN Solarwinds.dbo.NodesThresholds nt_loss WITH (NOLOCK) ON (Nodes.NodeID = nt_loss.InstanceId AND nt_loss.Name = 'Nodes.Stats.PercentLoss')
--join last recorded response/loss per node
LEFT JOIN (
SELECT NodeID, MAX([DateTime]) AS max_datetime
FROM [Solarwinds].[dbo].[ResponseTime_Detail]
GROUP BY NodeID) MaxDates ON MaxDates.NodeID = Nodes.NodeID
LEFT JOIN [Solarwinds].[dbo].[ResponseTime_Detail] r1 ON (r1.[DateTime] = MaxDates.max_datetime AND r1.NodeID = MaxDates.NodeID)
--join site and latency offset
LEFT JOIN SolarWindsAux.dbo.Sites s WITH (NOLOCK) ON s.SiteID = Nodes.n_site_id
LEFT JOIN SolarWindsAux.dbo.LatencyOffset lo WITH (NOLOCK) ON (Nodes.EngineID = lo.EngineID AND s.LocationID = lo.LocationID)
---------------------------------------------
WHERE
Nodes.UnManaged = 0 AND --node is not unmanaged
Nodes.Status <> '2' AND --node is not down
(
(
nt_loss.CurrentValue > nt_loss.Level1Value AND --node has high AVG packet loss value
ISNULL(r1.PercentLoss,0) > nt_loss.Level1Value --node has high LAST packet loss value
) OR
(
nt_resp.CurrentValue - ISNULL(lo.LatencyOffset,0) > nt_resp.Level1Value AND --node has high AVG responce time (minus Latency offset, based on location, if specified)
ISNULL(r1.AvgResponseTime,0) - ISNULL(lo.LatencyOffset,0) > nt_resp.Level1Value --node has high LAST responce time (minus Latency offset, based on location, if specified)
)
)
* Notice that we not only check the average here but also check the last recorded value and only trigger an alert if last value exceed threshold as well
(4)
Download the above SQL as a ready-made report and import into Orion Report-Writer.
Add this report as an on page resource on your NOC page
(5)
Bonus. Now, let's create few more reports for the node summary page:
SQL source for Location
SELECT
Sites.SiteID AS 'SITE'
,Sites.Address AS 'ADDRESS'
,Sites.Description AS 'DESCRIPTION'
FROM SolarwindsAux.dbo.Sites
INNER JOIN Solarwinds.dbo.Nodes n ON n.n_site_id = Sites.SiteID
WHERE n.Caption = '${Caption}'
SQL source for Latency Offset
SELECT
Nodes.EngineID AS 'Poller ID'
,e.ServerName AS 'Poller Name'
,s.LocationID AS 'Node Location'
,lo.LatencyOffset AS 'Latency Offset'
FROM Solarwinds.dbo.Nodes
LEFT JOIN SolarWindsAux.dbo.Sites s ON Nodes.n_site_id = s.SiteID
LEFT JOIN SolarWindsAux.dbo.LatencyOffset lo ON lo.LocationID = s.LocationID
INNER JOIN Solarwinds.dbo.Engines e ON Nodes.EngineID = e.EngineID
WHERE Nodes.Caption = '${Caption}'
* you will find both reports attached as well.
Same as above - just add them as a resources on your node summary page
(*)
That's it! Well done!
Further Tips & Advice:
* This solution fits very well with Automatic Nodes Placement on the Worldmap as both will use same custom property to define location - SiteID
* You can have your locations based on countries, or you can go more tight and identify locations based on areas, towns, states, etc
To Your Monitoring Success,
Alex Soul
--