Frequently THWACKsters are asked for assistance in building reports which include the 90/95/99th percentiles (or similar) over a date span. We discuss Date spans in more detail in [mention:b4896b5b7bf14a1ba6eb90371a8f13aa:6b577b8c04704e209d29b6772bf67243] .
Leveraging the TOP Clause
For this work, we'll heavily rely on the TOP ## (PERCENT) clause in SWQL. This TOP selector has two options:
SELECT TOP ## <Fields> FROM <Entity>
or
SELECT TOP ## PERCENT <Fields> FROM <Entity>
Practical Effects
Suppose we have an entity with 20 records...
If we used SELECT TOP 10, we would only get the first 10 records in an undefined order.
| NodeID | TimestampUtc | CPUIndex | MinLoad | MaxLoad | AvgLoad |
|---|
| 13 | 2024-03-26 00:02:30 | 0 | 40 | 40 | 40 |
| 13 | 2024-03-26 00:02:30 | 1 | 21 | 21 | 21 |
| 13 | 2024-03-26 00:02:30 | 2 | 25 | 25 | 25 |
| 13 | 2024-03-26 00:02:30 | 3 | 32 | 32 | 32 |
| 21 | 2024-03-26 00:02:45.4692882 | 0 | 0 | 0 | 0 |
| 21 | 2024-03-26 00:02:45.4692882 | 1 | 0 | 0 | 0 |
| 12 | 2024-03-26 00:03:00 | 0 | 0 | 0 | 0 |
| 12 | 2024-03-26 00:03:00 | 1 | 0 | 0 | 0 |
| 12 | 2024-03-26 00:03:00 | 2 | 0 | 0 | 0 |
| 12 | 2024-03-26 00:03:00 | 3 | 0 | 0 | 0 |
If we used SELECT TOP 10 PERCENT, we would only get the first 2 records in an undefined order.
| NodeID | TimestampUtc | CPUIndex | MinLoad | MaxLoad | AvgLoad |
|---|
| 13 | 2024-03-26 00:02:30 | 0 | 40 | 40 | 40 |
| 13 | 2024-03-26 00:02:30 | 1 | 21 | 21 | 21 |
If we used SELECT TOP 10 with an ORDER BY clause, we'd get the first 10 records sorted by said field (MaxLoad).
| NodeID | TimestampUtc | CPUIndex | MinLoad | MaxLoad | AvgLoad |
|---|
| 13 | 2024-03-26 00:02:30 | 0 | 40 | 40 | 40 |
| 8 | 2024-03-26 00:03:40.9132182 | 3 | 34 | 34 | 34 |
| 8 | 2024-03-26 00:03:40.9132182 | 2 | 33 | 33 | 33 |
| 13 | 2024-03-26 00:02:30 | 3 | 32 | 32 | 32 |
| 6 | 2024-03-26 00:03:40.532767 | 0 | 29 | 29 | 29 |
| 8 | 2024-03-26 00:03:40.9132182 | 1 | 26 | 26 | 26 |
| 13 | 2024-03-26 00:02:30 | 2 | 25 | 25 | 25 |
| 8 | 2024-03-26 00:03:40.9132182 | 0 | 24 | 24 | 24 |
| 13 | 2024-03-26 00:02:30 | 1 | 21 | 21 | 21 |
| 5 | 2024-03-26 00:03:40.532767 | 0 | 19 | 19 | 19 |
If we used SELECT TOP 10 PERCENT with an ORDER BY clause, we'd get the first 2 records sorted by said field (MaxLoad).
| NodeID | TimestampUtc | CPUIndex | MinLoad | MaxLoad | AvgLoad |
|---|
| 13 | 2024-03-26 00:02:30 | 0 | 40 | 40 | 40 |
| 8 | 2024-03-26 00:03:40.9132182 | 3 | 34 | 34 | 34 |
You can see the sort order is very important when working with the TOP clause. Running TOP without a sort order is useful when you want a sampling of the data, but for calculations, remember to include sort order.
Scenario
You've been asked to build a 90th Percentile on Average Inbound Traffic (bits/sec) report for all interfaces in your environment. You know you have the data but need a good way to get at it. Here's a starting query that'll return just the top 90% of the metric we want.
-- **********************************************************
-- Show the TOP 90% of maximum inbound traffic last month
-- **********************************************************
SELECT TOP 90 PERCENT [In90th].InAveragebps
FROM Orion.NPM.InterfaceTraffic AS [In90th]
-- Filter for a specific timespan (last month)
WHERE [In90th].ObservationTimestamp >= AddMonth(- 1, DateTrunc('month', GETUTCDATE()))
AND [In90th].ObservationTimestamp < DateTrunc('month', GETUTCDATE())
-- Above WHERE clause is identical to:
-- WHERE [In90th].ObservationTimestamp BETWEEN AddMonth(- 1, DateTrunc('month', GETUTCDATE())) AND DateTrunc('month', GETUTCDATE())
-- Sort by the numerical value we want
ORDER BY [In90th].InAveragebps DESC
-- Note we are not filtering or grouping to a specific interface yet
This is great, but it doesn't have any context. We don't know the node, the interface, or anything else. In fact, we might be getting duplicate entries in the above dataset, but we won't know until we bring in more information.
Link to Entities to Provide Context
Link to Interfaces --> Nodes
Logic Explained
Graphic
Adding more Metrics
How to extend.