OK, let's get started 🙂
Series 1 is all about KPIs and ... there is going to be plenty of them.
All these KPIs are state of the art charts with custom-made SQL driven queries. They are #1 go-to stats for any Technology Manager to see things through. They track the whole escalation flow, break-fix timing, SLAs, alerts rates, compliance, aging alerts, noise alerts and many other metrics that help you to understand how "busy" your support teams might be and how stable your infrastructure really is, over time.
Here is sneak-peak preview of how it might look like ... (resolution is deliberately low - my intention through this series is for you to build that dashboard for your own infrastructure):
All these KPIs are designed to be used as a widgets/resources on summary views. I personally have a dedicated view summary page and about 20 or 30 different widgets what is making it all up. I have further split this page into tabs to group them together by focus area and ensure page loads faster, with about 8-10 widgets per tab.
So, to get the most out of this series I recommend you to create a new summary view and start assembling your KPIs right there. I will share all my KPIs widgets with you - one per post. So, be sure to subscribe to updates on my Home Page to not miss them
Without further a do ... here is the first one 🙂
Take a look at the widget screenshot below. It sits on top of the first tab of my KPI view page and gives me an idea what's happening at a glance with volume of alerts compared to volume of events on one graph, pretty much live, right now, as we speak
Notice that this window is 60 minutes. I deliberately have chosen such a small window to be able to see everything minute-by-minute, without summarizing any data into averages. When you hover over any bar - you can actually see exact numbers for that particular point in time (minute). Stay tuned and we will have more KPI charts coming that will have larger window for more broader overview.
How this is done
Add Custom Chart widget onto your summary view
Give it Title and Subtitle. I tend to maintain the following format for my subtitles to ensure consistency by specifying period and sample interval so that it is immediately obvious how wide is actual window (period) and how aggressively we summarize data (sample)
> Click on SELECT DATASOURCE
> select Advance method as Database Query
> select SQL radio button
> paste the following SQL script (source code attached)
> click on preview results to make sure data comes back as expected (should be aggregated into minute-by minute intervals)
> give a name for your selection (in my case I call it "Rates")
> click UPDATE DATA SOURCE
So far we have the following ...
Now, this is what took me quite a while to figure out and adjust SQL accordingly:
> Group data by: Legend
> Legend shows: Legend
> Set time period to 1 minute
Now, let's setup left Y-AXIS:
> Add data series to left Y-Axis
> Select Alerts > Add data series
> Click on more link to see additional settings
> Click on time column and select SAMPLE column
> Click on color drop-down and select color of your choice for alerts columns
Let's now repeat this for the right Y-AXIS column, but in this case you need to select Events as a data series and pick different color. All the rest steps should be the same as per section (5) above
And finally, very important last step:
> go to time period drop down and select Custom
> give it a name "Last 60 Minutes"
> select last 60 minutes
So, we now have the following setup (as per full screenshot below). Now, when you click SUBMIT you should hopefully get your first KPI chart working 🙂
You may have an issue with timing, whereby you graph's last 60 minutes comes as blank (something like below). If that's the case - please post a comment below and I will explain how to deal with this issue. This is because of the very strange behavior with the way SolarWinds processes time which I still don't understand 100%, but I have a workaround for such issue
Remember to give it a kudos and subscribe for updates on my home page
Thanks Alex for the inspiration. Will be interesting to see the rest.
Two things; I tested this on two installations. On one I got the "timing issue", on the other it just worked. Would be interested to hear your workaround.
The other thing, why use SQL and not SWQL? (Except that SQL might be easier to write in 🙂 )
Nice one, glad it has worked for you.
So, timing issue ...
Let me illustrate what I believe is happening...
First - let's run this exact same query in SQL Management Studio:
Now, let's do the same via PREVIEW in actual widget
And finally, just note down which exact hour is being displayed in your widget (in my case, now, it shows between 4:30 - 5:30am)
Notice there is 6 hour difference in actual time being recording in table and the time in preview window. Time in the first screenshot comes back as SQL system time (which happens to be CT in my case), but time in the actual widget preview comes back 6 hours ahead, which coincidentally happens to be my local time (which is GMT, I am in London). The actual widget UI element (screenshot #03) shows CT time, which is correct and expected.
So, what I have done is simply subtracted those 6 hours in SQL script. To do this you need to modify 2 lines (line #3 and line #7) as per below example. Your goal with this adjustment is to make sure that time which is returned in PREVIEW matches time recorded in database
SELECT TOP(60) 'Count' AS 'Legend', DATEADD(hour,-6,DATEADD(minute, DATEDIFF(minute, 0, e.[EventTime]), 0)) AS 'SAMPLE', SUM(CASE WHEN e.EventType = 5000 THEN 1 ELSE 0 END) AS 'Alerts', COUNT(*) AS 'Events' FROM Events e GROUP BY DATEADD(hour,-6,DATEADD(minute, DATEDIFF(minute, 0, e.[EventTime]), 0)) ORDER BY 'SAMPLE' desc
This is very strange behavior and I hope someone from SolarWinds product team whoever happens to read this can explain this to me - I could never figure it out why it is doing it in such way. It may well be a bug, but I never reported it and just used this workaround instead
One obvious disadvantage is that twice per year this needs to be adjusted.
And, another peculiar fact - even though I have subtracted 6 hours in script, people who access this widget via different time zones with different offset hours (for example India, with almost 12 hour difference) they see data correctly.
So, God knows what's going on here ... but it works 🙂 I hope one day someone will shed light on this behavior
As for the second question - why SQL - I have 2 reason:
There is one drawback however - table structure may change and may invalidate scripts with future releases. But, out of hundreds of scripts that I have running, only few ever got broken. SolarWinds is pretty good in ensuring backwards compatibility, but, obviously this is not something they will support. In light of this - SWQL seems to be better option to ensure your scripts will not break after upgrade.
Hmmm, so if we get the time in UTC time in the data source. The widget will get the browsers timezone and adjust the graph accordingly!?!!? In that case we should always have UTC time in graph data.
Agree - time is strange. I get correct time in SSMS, correct time in "test datasource" but the graph shows 2 hours wrong.
Local time is 2 hours diff against UTC.
Subtracted 2 hours in query and graph is correct. Feels wrong that Orion mess with my datasource like this...
SQL vs SWQL - another drawback: SQL does not always work for non-admins and SQL as datasource does not take care of account limitations etc. SWQL might be more "future safe". Otherwise I agree with you. I try to use SWQL first, if I fail I use SQL.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process. Learn more today by joining now.