cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Alerts and Events Trigger Rate (minute-by-minute)

Overview Home Page - Event-6103 Series 1Small-up Episode 1


OK, let's get started 🙂 

Series 1 is all about KPIs and ... there is going to be plenty of them.

All these KPIs are state of the art charts with custom-made SQL driven queries. They are #1 go-to stats for any Technology Manager to see things through. They track the whole escalation flow, break-fix timing, SLAs, alerts rates, compliance, aging alerts, noise alerts and many other metrics that help you to understand how "busy" your support teams might be and how stable your infrastructure really is, over time.

Here is sneak-peak preview of how it might look like ... (resolution is deliberately low - my intention through this series is for you to build that dashboard for your own infrastructure):

03.PNG

All these KPIs are designed to be used as a widgets/resources on summary views. I personally have a dedicated view summary page and about 20 or 30 different widgets what is making it all up. I have further split this page into tabs to group them together by focus area and ensure page loads faster, with about 8-10 widgets per tab. 

So, to get the most out of this series I recommend you to create a new summary view and start assembling your KPIs right there. I will share all my KPIs widgets with you - one per post. So, be sure to subscribe to updates on my Home Page to not miss them

Without further a do ... here is the first one 🙂


Take a look at the widget screenshot below. It sits on top of the first tab of my KPI view page and gives me an idea what's happening at a glance with volume of alerts compared to volume of events on one graph, pretty much live, right now, as we speak

  • Left Y-AXIS - number of alert in a given minute (blue color)
  • Right Y-AXIS - number of events in a given minute (purple color)
  • Bottom X-AXIS - timeline 

alexsoul_0-1592045932313.png

Notice that this window is 60 minutes. I deliberately have chosen such a small window to be able to see everything minute-by-minute, without summarizing any data into averages. When you hover over any bar - you can actually see exact numbers for that particular point in time (minute). Stay tuned and we will have more KPI charts coming that will have larger window for more broader overview.


How this is done

(1)
Add Custom Chart widget onto your summary view

(2)
Give it Title and Subtitle. I tend to maintain the following format for my subtitles to ensure consistency by specifying period and sample interval so that it is immediately obvious how wide is actual window (period) and how aggressively we summarize data (sample)

  • Title: Alerts and Events Trigger Rate (minute-by-minute)
  • Subtitle: Period: Last 60 Minutes | Sample: Once per minute

(3)
> Click on SELECT DATASOURCE
> select Advance method as Database Query
> select SQL radio button
> paste the following SQL script (source code attached)
> click on preview results to make sure data comes back as expected (should be aggregated into minute-by minute intervals)
> give a name for your selection (in my case I call it "Rates")
> click UPDATE DATA SOURCE

So far we have the following ...

04.PNG

(4)
Now, this is what took me quite a while to figure out and adjust SQL accordingly:

> Group data by: Legend
> Legend shows: Legend
> Set time period to 1 minute

06.PNG

(5)
Now, let's setup left Y-AXIS:

> Add data series to left Y-Axis
> Select Alerts > Add data series
> Click on more link to see additional settings
> Click on time column and select SAMPLE column
> Click on color drop-down and select color of your choice for alerts columns

10.PNG

 

(6)
Let's now repeat this for the right Y-AXIS column, but in this case you need to select Events as a data series and pick different color. All the rest steps should be the same as per section (5) above

(7)
And finally, very important last step:

> go to time period drop down and select Custom
> give it a name "Last 60 Minutes"
> select last 60 minutes
> ADD

07.PNG

(8)

So, we now have the following setup (as per full screenshot below). Now, when you click SUBMIT you should hopefully get your first KPI chart working 🙂

08.PNG

 

NOTE:

You may have an issue with timing, whereby you graph's last 60 minutes comes as blank (something like below). If that's the case - please post a comment below and I will explain how to deal with this issue. This is because of the very strange behavior with the way SolarWinds processes time which I still don't understand 100%, but I have a workaround for such issue 

09.PNG

 

Enjoy 😉

Remember to give it a kudos and subscribe for updates on my home page

 

With Gratitude, 
Alex Soul

 

Labels (2)
4 Replies
Level 12

Thanks Alex for the inspiration. Will be interesting to see the rest. 

Two things; I tested this on two installations. On one I got the "timing issue", on the other it just worked. Would be interested to hear your workaround.

The other thing, why use SQL and not SWQL? (Except that SQL might be easier to write in 🙂 )

Nice one, glad it has worked for you. 

So, timing issue ... 

Let me illustrate what I believe is happening... 

First - let's run this exact same query in SQL Management Studio:

Screenshot #01
01.PNG

Now, let's do the same via PREVIEW in actual widget

Screenshot #02
02.PNG

And finally, just note down which exact hour is being displayed in your widget (in my case, now, it shows between 4:30 - 5:30am)

Screenshot #03
03.PNG

Notice there is 6 hour difference in actual time being recording in table and the time in preview window. Time in the first screenshot comes back as SQL system time (which happens to be CT in my case), but time in the actual widget preview comes back 6 hours ahead, which coincidentally happens to be my local time (which is GMT, I am in London). The actual widget UI element (screenshot #03) shows CT time, which is correct and expected.

So, what I have done is simply subtracted those 6 hours in SQL script. To do this you need to modify 2 lines (line #3 and line #7) as per below example. Your goal with this adjustment is to make sure that time which is returned in PREVIEW matches time recorded in database

 

SELECT TOP(60)
  'Count' AS 'Legend',
  DATEADD(hour,-6,DATEADD(minute, DATEDIFF(minute, 0, e.[EventTime]), 0)) AS 'SAMPLE',
  SUM(CASE WHEN e.EventType = 5000 THEN 1 ELSE 0 END) AS 'Alerts',
  COUNT(*) AS 'Events'
FROM Events e
GROUP BY DATEADD(hour,-6,DATEADD(minute, DATEDIFF(minute, 0, e.[EventTime]), 0))
ORDER BY 'SAMPLE' desc

 

This is very strange behavior and I hope someone from SolarWinds product team whoever happens to read this can explain this to me - I could never figure it out why it is doing it in such way. It may well be a bug, but I never reported it and just used this workaround instead

One obvious disadvantage is that twice per year this needs to be adjusted.

And, another peculiar fact - even though I have subtracted 6 hours in script, people who access this widget via different time zones with different offset hours (for example India, with almost 12 hour difference) they see data correctly. 

So, God knows what's going on here ... but it works 🙂 I hope one day someone will shed light on this behavior

---------------

As for the second question - why SQL - I have 2 reason:

  1. SQL knowledge is transferable, SWQL is not. I tend to learn a lot whilst tinkling with SolarWinds and I can the re-use this knowledge elsewhere when compiling scripts and working on queries in other systems
  2. SQL is way more powerful than SWQL and has full spectrum of functions, access to system databases if needed and even access to other linked SQL servers, which is pretty awesome, as this has virtually no limit to the data I want to capture on screen

There is one drawback however - table structure may change and may invalidate scripts with future releases. But, out of hundreds of scripts that I have running, only few ever got broken. SolarWinds is pretty good in ensuring backwards compatibility, but, obviously this is not something they will support. In light of this - SWQL seems to be better option to ensure your scripts will not break after upgrade.

0 Kudos

Hmmm, so if we get the time in UTC time in the data source. The widget will get the browsers timezone and adjust the graph accordingly!?!!? In that case we should always have UTC time in graph data.

0 Kudos

Thanks,

Agree - time is strange. I get correct time in SSMS, correct time in "test datasource" but the graph shows 2 hours wrong. 

Local time is 2 hours diff against UTC.

Subtracted 2 hours in  query and graph is correct. Feels wrong that Orion mess with my datasource like this... 

SQL vs SWQL - another drawback: SQL does not always work for non-admins and SQL as datasource does not take care of account limitations etc. SWQL might be more "future safe". Otherwise I agree with you. I try to use SWQL first, if I fail I use SQL.