cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Alerting on Volume Thresholds

Back in April of 2015, NPM 11.5 was released and with it came a brand new Web-based Alert Engine in the Orion Platform. At the time, and ever since, one of the most valuable capabilities of this new engine was the ability to dynamically alert on multiple objects based on their own individually assigned thresholds. Setting individual thresholds for things like CPU Utilization, Percent Memory Utilization, Packet Loss, Response Time, Interface Errors, and Interface Utilization was a game changer to a lot of alerting schemas that allowed us to reduce our custom property footprint, as well as the complexity of the alert definitions. However, a glaring "omission" was that the thresholds made available for Volumes were not presented to the alerting engine (or so we thought). This was a bit mind-boggling, and talking to other MVPs, seasoned SW Admins, and SW employees over the years, I had never heard differently, so the assumption was cemented as a "missing item that requires a work-around". (On a side note, I am 42% sure that jbiggley​ was behind a very well orchestrated and elaborate trolling to keep me in the dark on this capability, but I digress...) But today, I'd like to present the solution that was hiding in the background this entire time, to save future admins the discomfort of maintaining "Disk_Crit" custom properties.

NOTE: As tait.cyrus​ mentions in the comments below:

A problem with using "Volume Capacity Forecasting" is newly added nodes will not have any forecasting data for several days, so you will not be able to get any volume alerts from newly added nodes (until forecasting volume data becomes available which can be somewhere in the 1-3 days time period) since volume thresholds won't appear until the forecasting data appears.

The problem arises from the fact that "Volume Capacity Forecasting" is a database 'view' made up of data from various other tables including a database JOIN of a forecasting table so until there is forecasting data available for a volume on a node, "Volume Capacity Forecasting" will not show anything so no threshold data will be available and thus no ability to generate volume alerts on the node. It would have been preferable that an appropriate database JOIN would have been used that would have shown threshold data even when forecasting data was not yet available. I did submit an incident on this and was told this is a known 'feature'. I have requested that this be changed so volume thresholds show up immediately allowing volume alerts to be immediately generated in newly added nodes.

tl;dr - Be aware that this solution has limitations on new volumes!!!

Background: Node and Interface metric thresholds are added to the alerting engine in a very intuitive way:

pastedImage_2.png

However, volume thresholds are obviously not:

volThresh.jpg

The key was to take a step back and look at the alerting object options, there you shall find your salvation in the form of a "Volume Capacity Forecasting" object (as opposed to the intuitive "Volume" object type):

volCapThresh.jpg

Which then presents those valuable thresholds!

zzzz.jpg

From there, you need to setup a "Double Value Comparison" in the trigger:

double-value.jpg

And then create a comparison between the current and threshold values, respectively:

trigger.jpg

Which will then trigger on Volumes where their current percent utilization exceed the threshold you have defined on that specific volume:

trigTest.jpg

For reference: thresholds are edited per object by editing the object's properties, and looking at the bottom of the page: (Pro Tip: you can edit multiple objects at once from the "Manage Nodes/Entities" page)

capPlan.jpg

Verified via SQL search on the "VolumesForecastCapacity" view in the database:

sqlValidation.jpg

SELECT TOP 100 * FROM VolumesForecastCapacity

There you have it. Happy monitoring everyone!

Labels (2)
Comments

Trolling Master Class is now complete.

Nice find zackm​!

Can you point me to where those first few screenshots are located? Is Volume Capacity Forecasting built in, or do I need to create a new alert?

you would need to author a new alert. those screenshots are from the alert engine while I was testing this idea. I put a few links in the beginning of the post, one of them is a pretty nice SolarWinds lab video that covers the "How to" portion of alerting.

Nice Zack!

Thanks zackm​​. Great find.

A problem with using "Volume Capacity Forecasting" is newly added nodes will not have any forecasting data for several days, so you will not be able to get any volume alerts from newly added nodes (until forecasting volume data becomes available which can be somewhere in the 1-3 days time period) since volume thresholds won't appear until the forecasting data appears.

The problem arises from the fact that "Volume Capacity Forecasting" is a database 'view' made up of data from various other tables including a database JOIN of a forecasting table so until there is forecasting data available for a volume on a node, "Volume Capacity Forecasting" will not show anything so no threshold data will be available and thus no ability to generate volume alerts on the node. It would have been preferable that an appropriate database JOIN would have been used that would have shown threshold data even when forecasting data was not yet available. I did submit an incident on this and was told this is a known 'feature'. I have requested that this be changed so volume thresholds show up immediately allowing volume alerts to be immediately generated in newly added nodes.

A

Great point! Updated the post with your notes. Thanks.

This particular topic became of great interest to me in late 2017 after upgrading from NPM 12.0 to 12.2.  Due to various problems with the upgrade (later successfully addressed by Hot Fixes), poller drive space would be consumed by failed poller communication jobs, which resulted in poller failure due to loss of sufficient C: drive space.

Until the problem could be prevented, I was forced to reach out to Thwack and ask other for help to create a kluge for volume thresholds.  I subsequently put this information in the front page of my NOC View, and "knew" to go into any APE with excessive drive space consumed and remove / replace files per SW's recommendations.

The Resource looks like this:

pastedImage_0.png

It is built as follows:

pastedImage_1.png

The actual SWQL query follows:

SELECT n.caption as [Node]

, v.Caption as [Volume]

, round(v.VolumePercentUsed,1) as [Space Used]

,round(v.volumespaceavailable/1073741824,1) as [Free GB]

,case when f.DaysToCapacityPeak is null then 'No Forecast'

when f.DaysToCapacityPeak<0 then 'Full'

when f.DaysToCapacityPeak<91 then tostring(f.DaysToCapacityPeak)

when f.DaysToCapacityPeak>90 then '>90 Days'

end as [Days til Full],

v.DetailsUrl as [_linkfor_Volume]

, n.DetailsUrl as [_linkfor_Node]

,'/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Node]

, case when fc.WarningThreshold is null and v.VolumePercentUsed > (Select CurrentValue AS [col1] FROM Orion.Settings WHERE SettingID = 'NetPerfMon-DiskSpace-Error') then '/Orion/images/StatusIcons/Small-Critical.gif'

when fc.WarningThreshold is null and v.VolumePercentUsed > (Select CurrentValue AS [col1] FROM Orion.Settings WHERE SettingID = 'NetPerfMon-DiskSpace-Warning') then '/Orion/images/StatusIcons/Small-Warning.gif'

when v.VolumePercentUsed>f.CriticalThreshold then '/Orion/images/StatusIcons/Small-Critical.gif'

when v.VolumePercentUsed>f.WarningThreshold then '/Orion/images/StatusIcons/Small-Warning.gif'

end as [_Iconfor_Space Used]

,'/Orion/images/StatusIcons/Small-' + v.StatusIcon AS [_IconFor_Volume]

from orion.Volumes v

join orion.nodes n on n.nodeid=v.nodeid

left join Orion.ForecastCapacitySettings fc on fc.InstanceId=v.VolumeID and fc.metricid=3

left join Orion.ForecastCapacity f on f.InstanceId = v.VolumeID and f.EntityType='Orion.volumes'

left join Orion.AlertSuppression asup on asup.entityuri = n.uri

where v.FullName in ('INSERT YOUR POLLER NAME HERE-C:\ Label: INSERT YOUR DRIVE VOLUME LABLE HERE','Repeat as needed until you have all of your APE's and Labelled drives entered here')

order by currentvalue desc

Thanks go to all those who helped craft the above query, without payment or other recognition.  What a swell group of people!

What would a Slack-proof trigger action look like? My usual one below works for *nix paths, but not Windows.

payload={

"channel": "@slackbot",

"username": "solarwinds",

"color": "Warning",

"text": "<${N=Alerting;M=AlertDetailsUrl}|${N=Alerting;M=Severity}>: `<${N=SwisEntity;M=Volume.DetailsUrl}|${N=SwisEntity;M=Volume.Caption}>` on `<${N=SwisEntity;M=Volume.Node.DetailsUrl}|${N=SwisEntity;M=Volume.Node.Caption}>`: ${N=SwisEntity;M=Volume.VolumePercentUsed} used."

}

(The trick to convert single backslashes to dual ones below - isn't working in this case - assuming because the variables are different.)

${SQL: SELECT REPLACE(Caption, '\', '\\') FROM Volumes WHERE VolumeID = ${VolumeID}}

Thank you!

have you tried replacing ${VolumeID} in your SQL variable with ${N=SwisEntity;M=Volume.VolumeID}

Hi

I have used this solution for several years and it works well but it annoys me that we have to wait 7 days (i think) until we have values in the VolumesForecastCapacity table. So I thought that this must be something I can solve with some SQL :-).

I thought that if the volume is new I will alert if it has used more than 95%. Just to have something to start with.

Create a new alert, a "Custom SQL Alert" with Volume as target. Paste in below code:

LEFT JOIN VolumesForecastCapacity VFC ON Volumes.VolumeID=VFC.InstanceId

WHERE

  (

    (VFC.CurrentValue>VFC.CriticalThreshold AND VFC.CurrentValue IS NOT NULL)

   OR

(VFC.CurrentValue IS NULL AND Volumes.VolumePercentUsed>95)

  )

  AND

(Volumes.VolumeTypeID=4 OR Volumes.VolumeTypeID=100)

And it should look something like this:

pastedImage_1.png

No I have not tested this very much in production yet but It feels like it should work!

Try and say what you think!!

PS: VolumeTypeID 4 and 100 are "Fixed disk" and "Mount Point"

That did it - thank you!

Did anyone have any issues with configuring the actual trigger condition? For some reason I don't have the option for the comparison portion of it...does anyone have any ideas as to why?

pastedImage_0.png

Thanks in advance!

you need to use the "Add Double Value Comparison" option

double-value.jpg

New 2020.2 RC has new Warning/Critical values inside the Volume category.

So we will not need to use Forecasting values anymore 🙂

 

Volume thresholds 2020.2.png

Hi, is that not point in time though? Does it account for growth?

Version history
Revision #:
1 of 1
Last update:
‎01-31-2018 10:08 AM
Updated by: