This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Getting duration of High Bandwidth Utilization

My director wants a report that gives us the amount of time each branch location spent above 80% bandwidth utilization for the previous month. Honestly, right now I'd settle for being able to give him the duration of an individual event. I know how to get the min/max/average utilization for an interface but I can't figure out the duration. As far as I can see there's absolutely NO way to get this out of Report Writer or the Web Reporting Interface. I tried creating a report in report writer that would pull everything listing a 5000 or 5001 event within my chosen time span. I can even get the report to order it so that each Alert trigger is followed by it associated Alert Reset.

Here's where I lose it. I'm not a programmer at all. I know it should be possible to calculate the difference between the DATETIME stamps for each alert pair. I should then be able to add up the results of that operation for each pair of a given node to give me a total for that time period. It seems so simple on the surface. I just can't seem to figure out how to do it. At this point, I'd be pretty happy if I could, for a single specified node, get a report that gives me the duration between each high bandwidth utilization trigger and its associated reset. Actually, I'd be happy if I could get that for a single event. If anyone can lead me in the right direction on this I'd appreciate it.

I've attached the report from report writer just to show the information I'm dealing with.

Triggered_and_Reset_Alerts_-_Last_7_Days.OrionReport
  • At my previous employer I took a unique approach to doing this.  At first I was going the same route you were doing trying to calculate the start stop timing of the alerts.  I couldn't figure out a way to do it effectively so I went another route.

    In my alert I wrote to the event log the following every 30 minutes the circuit was over my 80% threshold.  Circuit Exceeding 80% utilization.  I then wrote a report that counted the number of times this was recorded in a day/week/month.  That information could then be extrapolated out so that if it fired 10 times I knew that the circuit was over utilized for 300 minutes. 

    My network manager really liked this approach and we then did the same thing for 60%.  This allowed us to kind of target bandwidth upgrades.  Nodes at the top of the 80% list were targeted for immediate upgrades while the 60% nodes we were keeping an eye on.

  • EDIT: I forgot to count your minutes on the first release. Fixed report below.

    I was actually coming here to recommend something very similar, but I like this better emoticons_wink.png

    Here is the basic query. The commented out section at the bottom can be used to trim out business hours if desired. This relies on the 30 days default retention for your events table. If you need something different, that can be arranged fairly easily as well.

    select

    x.interfaceid

    ,x.interface

    ,x.times 'times over xx%'

    --change the # here to reflect your event interval... 30 = every 30 mins, etc.

    ,(x.times * 30) 'minutes over xx%'

    from

      (select

      e.netobjectid 'interfaceid'

      ,i.fullname 'interface'

      ,count(*) 'times'

      from [events] e

      join interfaces i on i.interfaceid=e.netobjectid

      where e.eventtype = 5000

      and e.netobjecttype = 'I'

      -- change this line to match your event message being generated

      and e.message like '%utilization%'

      --and 

      --(

      --  (DATEPART(weekday, e.eventtime) <> 1) AND -- 1 represents Sunday

      --  (DATEPART(weekday, e.eventtime) <> 7) AND -- 7 represents Saturday

      --  (Convert(Char,e.eventtime,108) >= '08:00') AND -- 8:00am ***Use 24H time here***

      --  (Convert(Char,e.eventtime,108) <= '18:00') -- 6:00pm ***Use 24H time here***

      --)

      group by e.netobjectid, i.fullname) x

    order by 'minutes over xx%' desc

    End Result looks like this:

    interfaceid                     interface                         times over xx%   minutes over xx%

    ----------- ----------------------------------------------------- --------------   ----------------

    645                 Interface 1 Name                                    35               1050

    402                 Interface 2 Name                                    3                90

    627                 Interface 3 Name                                    1                30

    Hope this helps! emoticons_grin.png

    -ZackM

    Loop1 Systems: SolarWinds Training and Professional Services

  • THANK YOU THANK YOU THANK YOU!  This makes SO much more sense to me than trying to figure out the calculations on the datetime stamp. My director is ok with this approach as long as he can get his information to a resolution of 5 minutes or so. At that resolution it's going to be one huge events table emoticons_happy.png.

    One more question...

    Are you using the escalation part of the alert manager to create that timing? So the alert fires whenever the event happens but then it uses whatever timing is configured in "Execute this action repeatedly while the Alert is Triggered" box?

    Thanks again. You have no idea how long I've been beating my head against the wall over this report.

    Laura

  • Yes on the escalation to retrigger the event.  However keep in mind default polling interval for interface statistics is 9 minutes.  So you won't get to a true granularity of 5 minutes unless you poll the interface more frequently.  I would personally apply custom properties to the key interfaces you are concerned about (Something like WAN_Interface yes/no) and then only alter those interfaces and write the alert to only cover those interfaces.

  • x2

    I would almost NEVER increase interface polling to 5 minutes by default. There are, of course, some environments that can handle that, but that's the exception, not the norm.

  • We did highly critical interfaces (approximately 100 in our environment) at 1 min intervals. 

  • oh yeah, that's totally do-able with the right server(s). I'm agreeing with your comment about using a custom property for identifying specific interfaces to change the interval on instead of changing the default interval for all interfaces.

  • I agree. I do occasionally increase the polling frequency of stats for a single node to as rapid as 2 minutes, but doing this globally would be a huge waste of element polling. Typically, I only apply this to a node with WAN interfaces that I need to manage to tighter requirements than other interfaces.