I was talking through the data collection (default) intervals with one of our VMware architects earlier this week. This is what I told him:
VMAN polls every 10 minutes via the vSphere API
Data is collected in 10 second blocks for the entire period (60 blocks by default)
Data presented via the console is the peak value, average value, etc. for the 10 minute period.
"Wait a second," said the architect, "if the data is an average can you show me the standard deviation for that 10 minute period along with the average and peak?"
Does anyone know if this is possible? Is the standard deviation stored anywhere for us to display in a graph or report?
We can provide the average and peak but we do not provide the standard deviation. Was standard deviation a must have for the architect or only for certain scenarios?
It was only in certain scenarios, although I do find that more and more folks are asking for detailed data analytics and statistical information to help them understand that data that Solarwinds products are giving them. Standard deviation might be an interesting metric to stick into the development roadmap of a whole bunch of products.
We should configure Thwack to send out ALERT: OLD THREAD RESURRECTION e-mail alerts, with a Sev 1 priority, LOL.
Case in point, I am looking to show standard deviations for interface throughput, errors, Multicast and Broadcast traffic, etc. I searched Thwack for other people's use of Stdev, and found this unanswered thread. I'm resurrecting the thread to share what I find, and maybe then jbiggley can return to his VMWare Architect some 3-1/2 years later, and say, I've got that information you wanted...
The idea I am working on is to develop data to understand the nature of normal traffic levels over interfaces. We (Network Admins) can then use this data to determine what levels of traffic equals abnormal traffic, and what equals abnormal. For example, we could use Standard Deviations of Multicast and Broadcast traffic levels for planning QOS policing policies, to throttle Multicast data over a certain level. Also, alerts could be configured for Multicast/Broadcast/Errors/BPS/PPS traffic exceeding normal levels, for example 3 standard deviations above Mean would infer something statistically significant is happening, and that could be worthy of making a note of.
There is a SQL function called Stdev. It works very much like average (AVG) and Min (MIN) and Max (MAX) functions. So, what I did was just grabbed a Average/Max per Month report from the old Report Writer, and used that as a basis. This is still in progress, but I thought I'd share this, see where it leads. The below has unnecessary lines which I bleeped out with double dashes. I'll post the end result when I am through.
CONVERT(DateTime, LTRIM(MONTH(DateTime)) + '/01/' + LTRIM(YEAR(DateTime)), 101) AS SummaryMonth
, n.NodeID AS NodeID
, i.InterfaceID AS InterfaceID
, n.Caption AS NodeName
-- , n.VendorIcon AS Vendor_Icon
, i.Caption AS Interface_Caption
-- , i.InterfaceIcon AS Interface_Icon
, AVG(it.In_AvgMultiCastPkts) AS AVG_of_MC_Pkts
, STDEV(it.In_AvgMultiCastPkts) AS STDev_of_MC_Pkts
, MAX(it.In_MaxMultiCastPkts) AS MAX_of_MC_Pkts
-- , AVG(it.Out_Averagebps) AS AVERAGE_of_Average_Transmit_bps
-- , MAX(it.Out_Maxbps) AS MAX_of_Peak_Transmit_bps
(Nodes n INNER JOIN Interfaces i ON (n.NodeID = i.NodeID)) INNER JOIN InterfaceTraffic it ON (i.InterfaceID=it.InterfaceID)
( DateTime BETWEEN 41000 AND GETDATE() )
AND n.Caption LIKE 'Host%'
AND i.InterfaceID = 5476
CONVERT(DateTime, LTRIM(MONTH(DateTime)) + '/01/' + LTRIM(YEAR(DateTime)), 101)
-- , n.VendorIcon
-- , i.InterfaceIcon
ORDER BY SummaryMonth ASC
-- , 4 ASC
-- , 6 ASC
Okay, so there are some problems with the above SQL script. As I said it is a work in progress. But, yes, the STDEV function does work. I'm learning a bit about setting time ranges, and intervals with SQL, and about which statistics are available in Orion's databases. Multicast traffic is what is important to me, but I don't see raw multicast traffic in the DB. I see MIN MAX and AVG for MC traffic in the DB, but then the STDEV wouldn't be STDEV on base MC rates, it would be STDEV on Averages or Minumums or Maximums. This is not quite what I want. I'll post more as I come up with more. But maybe we'll be able to answer the STDEV question for other metrics.
So, here's where I am now. I have found out that I can't access the raw Multicast data with the NPM database as it is. I may be able to create a Custom Poller to Poll the data directly from the interfaces. As it is, we can access Min, Max and Averages of Multicast data, in and out of an interface. If was take a Standard Deviation of those numbers, we would have the StDev of the Averages (or Mins or Maxes), but not the StDev of the actual metric I want to access.
No, I realize that this message was originally posted regarding VM data polls. From what jbiggley posted, "Data presented via the console is the peak value, average value, etc", the data he is looking at is in a similar situation -- Min Max and Average is available, but now the raw data. So, this is an issue with presentation -- I'm sure the data must be somewhere, but I can't find it. I'll follow-up on using a custom poller, as I have time.
I did want to share with you, how to do a StDev anyway -- whether or not it is on the raw data, so maybe that will be of use to you (or someone) at some point. Select the tables you want, and then create a temporary table for values over time by nesting Select operations.
,[Avg_MCPktIn] = AVG([Sub1].[InMcastPps])
,[StDev_MCPktIn] = STDEV([Sub1].[InMcastPps])
,[CI95_MCPktIn] = 1.96 * (STDEV([Sub1].[InMcastPps])/SQRT(COUNT(*)))
,[Avg_MCPktOut] = AVG([Sub1].[OutMcastPps])
,[StDev_MCPktOut] = STDEV([Sub1].[OutMcastPps])
,[CI95_MCPktOut] = 1.96 * (STDEV([Sub1].[OutMcastPps])/SQRT(COUNT(*)))
,[N] = COUNT (*)
SELECT TOP 2000
,[Day] = convert(int,CONVERT(char(8),CONVERT(date,[it].[DateTime]),112))
,[Hour] = DATEPART(hh,[it].[DateTime])
,[Minute] = DATEPART(mi,[it].[DateTime])
FROM [Nodes] [n] WITH (NOLOCK)
JOIN [Interfaces] [i] WITH (NOLOCK) ON ([n].[NodeID] = [i].[NodeID])
JOIN [InterfaceTraffic] [it] WITH (NOLOCK) ON ([i].[InterfaceID]=[it].[InterfaceID])
( [it].[DateTime] BETWEEN (DATEADD(DAY,-7,GETDATE())) AND GETDATE() )
GROUP BY [NodeID],[Day],[Hour],[Sub1].[InMcastPps]
ORDER BY [NodeID],[Day] ASC,[Hour] ASC
So, the above shows a Select operation on Nodes, Interfaces, and Interface Traffic, and all that is known as [Sub1]. This is lines 14 -28.
That creates a table of dates and times for some Multicast packet stats found in the Orion DB. then lines 1-13 and 29-30 take that data and workout Average, StDev, and 95% Confidence Interval for Multicast Packets/sec In and Out. This would normally seem like pretty good data -- but there's something wonky about the pps data in the database. I feel like it should change more often than it does. [Or, maybe it is just being collected at the wrong intervals -- LOL I just thought of that while typing -- isn't it strange when you are explaining something to someone else that suddenly your brain makes a connection that it wasn't all day? Funny how the brain works.] Anyway. These stats could tell an analytics Beavis the numbers they want to know. And, if the PPS are correct, then you should be able to use that information to determine which thresholds to set your Multicast Policing/Shaping levels in such a way that most MC traffic is unhindered, and only extreme peaks are throttled.
I haven't figured out how to alert using this, since when I looked at alerts, I didn't see a way to use custom SQL query as an alert trigger. Any ideas?
Thought I'd put my 2 cents in for figuring out an alert . Found this thread shortly after creating this: Transmit and Receive Utilization Dynamic Thresholds Minimum and Maximum? I think your query may be able to help me create custom properties for the minimum threshold.
Thinking it could be possible to run the a query similar as the above one a stored procedure every X amount of days, take the results from that query to populate a custom property named minimum_threshold to use as the alerting threshold of an interface being below the custom property for X amount of time.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.