This discussion has been locked.
You can no longer post new replies to this discussion. If you have a question you can start a new discussion

Alerts and Events Trigger Rate (minute-by-minute)

Overview Home Page - Event-6103 Series 1 - Small-up Episode 1

OK, let's get started  

Series 1 is all about KPIs and ... there is going to be plenty of them.

All these KPIs are state of the art charts with custom-made SQL driven queries. They are #1 go-to stats for any Technology Manager to see things through. They track the whole escalation flow, break-fix timing, SLAs, alerts rates, compliance, aging alerts, noise alerts and many other metrics that help you to understand how "busy" your support teams might be and how stable your infrastructure really is, over time.

Here is sneak-peak preview of how it might look like ... (resolution is deliberately low - my intention through this series is for you to build that dashboard for your own infrastructure):


All these KPIs are designed to be used as a widgets/resources on summary views. I personally have a dedicated view summary page and about 20 or 30 different widgets what is making it all up. I have further split this page into tabs to group them together by focus area and ensure page loads faster, with about 8-10 widgets per tab. 

So, to get the most out of this series I recommend you to create a new summary view and start assembling your KPIs right there. I will share all my KPIs widgets with you - one per post. So, be sure to subscribe to updates on my Home Page to not miss them

Without further a do ... here is the first one

Take a look at the widget screenshot below. It sits on top of the first tab of my KPI view page and gives me an idea what's happening at a glance with volume of alerts compared to volume of events on one graph, pretty much live, right now, as we speak

  • Left Y-AXIS - number of alert in a given minute (blue color)
  • Right Y-AXIS - number of events in a given minute (purple color)
  • Bottom X-AXIS - timeline 


Notice that this window is 60 minutes. I deliberately have chosen such a small window to be able to see everything minute-by-minute, without summarizing any data into averages. When you hover over any bar - you can actually see exact numbers for that particular point in time (minute). Stay tuned and we will have more KPI charts coming that will have larger window for more broader overview.

How this is done

Add Custom Chart widget onto your summary view

Give it Title and Subtitle. I tend to maintain the following format for my subtitles to ensure consistency by specifying period and sample interval so that it is immediately obvious how wide is actual window (period) and how aggressively we summarize data (sample)

  • Title: Alerts and Events Trigger Rate (minute-by-minute)
  • Subtitle: Period: Last 60 Minutes | Sample: Once per minute

> select Advance method as Database Query
> select SQL radio button
> paste the following SQL script (source code attached)
> click on preview results to make sure data comes back as expected (should be aggregated into minute-by minute intervals)
> give a name for your selection (in my case I call it "Rates")

So far we have the following ...


Now, this is what took me quite a while to figure out and adjust SQL accordingly:

> Group data by: Legend
> Legend shows: Legend
> Set time period to 1 minute


Now, let's setup left Y-AXIS:

> Add data series to left Y-Axis
> Select Alerts > Add data series
> Click on more link to see additional settings
> Click on time column and select SAMPLE column
> Click on color drop-down and select color of your choice for alerts columns



Let's now repeat this for the right Y-AXIS column, but in this case you need to select Events as a data series and pick different color. All the rest steps should be the same as per section (5) above

And finally, very important last step:

> go to time period drop down and select Custom
> give it a name "Last 60 Minutes"
> select last 60 minutes



So, we now have the following setup (as per full screenshot below). Now, when you click SUBMIT you should hopefully get your first KPI chart working



You may have an issue with timing, whereby you graph's last 60 minutes comes as blank (something like below). If that's the case - please post a comment below and I will explain how to deal with this issue. This is because of the very strange behavior with the way SolarWinds processes time which I still don't understand 100%, but I have a workaround for such issue 



Remember to give it a kudos and subscribe for updates on my home page

With Gratitude, 
Alex Soul

  • Thanks Alex for the inspiration. Will be interesting to see the rest. 

    Two things; I tested this on two installations. On one I got the "timing issue", on the other it just worked. Would be interested to hear your workaround.

    The other thing, why use SQL and not SWQL? (Except that SQL might be easier to write in )

  • Nice one, glad it has worked for you. 

    So, timing issue ... 

    Let me illustrate what I believe is happening... 

    First - let's run this exact same query in SQL Management Studio:

    Screenshot #01

    Now, let's do the same via PREVIEW in actual widget

    Screenshot #02

    And finally, just note down which exact hour is being displayed in your widget (in my case, now, it shows between 4:30 - 5:30am)

    Screenshot #03

    Notice there is 6 hour difference in actual time being recording in table and the time in preview window. Time in the first screenshot comes back as SQL system time (which happens to be CT in my case), but time in the actual widget preview comes back 6 hours ahead, which coincidentally happens to be my local time (which is GMT, I am in London). The actual widget UI element (screenshot #03) shows CT time, which is correct and expected.

    So, what I have done is simply subtracted those 6 hours in SQL script. To do this you need to modify 2 lines (line #3 and line #7) as per below example. Your goal with this adjustment is to make sure that time which is returned in PREVIEW matches time recorded in database

    SELECT TOP(60)
      'Count' AS 'Legend',
      DATEADD(hour,-6,DATEADD(minute, DATEDIFF(minute, 0, e.[EventTime]), 0)) AS 'SAMPLE',
      SUM(CASE WHEN e.EventType = 5000 THEN 1 ELSE 0 END) AS 'Alerts',
      COUNT(*) AS 'Events'
    FROM Events e
    GROUP BY DATEADD(hour,-6,DATEADD(minute, DATEDIFF(minute, 0, e.[EventTime]), 0))
    ORDER BY 'SAMPLE' desc

    This is very strange behavior and I hope someone from SolarWinds product team whoever happens to read this can explain this to me - I could never figure it out why it is doing it in such way. It may well be a bug, but I never reported it and just used this workaround instead

    One obvious disadvantage is that twice per year this needs to be adjusted.

    And, another peculiar fact - even though I have subtracted 6 hours in script, people who access this widget via different time zones with different offset hours (for example India, with almost 12 hour difference) they see data correctly. 

    So, God knows what's going on here ... but it works I hope one day someone will shed light on this behavior


    As for the second question - why SQL - I have 2 reason:

    1. SQL knowledge is transferable, SWQL is not. I tend to learn a lot whilst tinkling with SolarWinds and I can the re-use this knowledge elsewhere when compiling scripts and working on queries in other systems
    2. SQL is way more powerful than SWQL and has full spectrum of functions, access to system databases if needed and even access to other linked SQL servers, which is pretty awesome, as this has virtually no limit to the data I want to capture on screen

    There is one drawback however - table structure may change and may invalidate scripts with future releases. But, out of hundreds of scripts that I have running, only few ever got broken. SolarWinds is pretty good in ensuring backwards compatibility, but, obviously this is not something they will support. In light of this - SWQL seems to be better option to ensure your scripts will not break after upgrade.

  • Thanks,

    Agree - time is strange. I get correct time in SSMS, correct time in "test datasource" but the graph shows 2 hours wrong. 

    Local time is 2 hours diff against UTC.

    Subtracted 2 hours in  query and graph is correct. Feels wrong that Orion mess with my datasource like this... 

    SQL vs SWQL - another drawback: SQL does not always work for non-admins and SQL as datasource does not take care of account limitations etc. SWQL might be more "future safe". Otherwise I agree with you. I try to use SWQL first, if I fail I use SQL.

  • Hmmm, so if we get the time in UTC time in the data source. The widget will get the browsers timezone and adjust the graph accordingly!?!!? In that case we should always have UTC time in graph data.

  • This fix does not correct my issue. 
    I am getting the "Blank Widget" issue you describe above with the two lines on the left side of the widget.

    When I check the SQL on the Widget and also run the SQL in SolarWinds to see the different times being output, there is no difference in the time.


  • This fix does not correct my issue. 
    I am getting the "Blank Widget" issue you describe above with the two lines on the left side of the widget.

    When I check the SQL on the Widget and also run the SQL in SolarWinds to see the different times being output, there is no difference in the time.


No Data