cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 9

Configuring a single Low Disk Space Alert for many individual servers

Jump to solution

Hi Guys

A bit of a crazy situation to resolve so apologies in advance.

I'm trying to eliminate many false alerts currently being dropped into the teams mailboxes reporting on Low Disk Space.

The problem we have is that many of the servers in the estate are different; different disk/volume amounts, different disk/volume sizes, and different names.

I need to create 2 Low Disk Space Alerts that will report back on any disk/volume that has less that 500MBs, and the other alert to report back on 1GB or less disk space regardless of the entire size of the disks.

This eliminates specifying Disk Volume Percentage which is no good my different size disks.

All help and advice  appreciated.

1 Solution
Level 9

This is how I do it, to combine percentage and available disk space. You could drop the percentage entirely, since it might be redundant depending on disk sizes.

pastedImage_0.png

I also do this for the automatic reset condition as well to prevent reset happening too late, with larger disks.

pastedImage_2.png

Hope this helps you

View solution in original post

18 Replies
Level 9

Hi Guys

I've managed to get round this issue by applying the below alert which I'm testing at the moment so will monitor the alerts over the weekend and provide an update next week.

Thank you all for you input and advice on this matter.

pastedImage_0.png

0 Kudos
Level 9

These are the alerts I'm trying to setup and report on;

pastedImage_0.png

0 Kudos
Level 9

This is becoming a nightmare, something that should be so simple is clearly not so.

I have created 4 alerts today as per the settings below (one alert example below) and one would have expected this alert to report back on Low Disk Space across all nodes/server volumes/disks in the estate but instead I'm not.

I'm receiving alerts on disks that are not there and we have hundreds of application servers which have 5GB volumes and less user space, yet this alert nor any of the others are reporting back. However, I have received some alerts and they have reported against Page Files and Caches which I have specifically excluded in the alerts.

Can some one tell me what is wrong with these alerts please?

pastedImage_0.png

Should this alert be like this instead eliminating the % and making the 'does not contain' mandatory?

pastedImage_1.png

This is the Reset Trigger:

pastedImage_2.png

These are the alerts I'm receiving, why is the Page File being reported on?

Volume DC1**********9V-E:\ Label:PageFile 6******8:

      Total size 16.0 G

      Free space 4.93 M

      Percent used 100 %

And why am I receiving alerts on volumes that aren't there?

Volume AP**********02-E:\ Label:Data F*******0:

      Total size 0

      Free space 0

      Percent used 0 %

I can confirm that there are hundreds of servers that should be alerting on 5GB or Less Free Space.

0 Kudos

I would strip the alert back and use the summary page to see whats appearing after each one to see if the volumes are appearing that you expect?

0 Kudos

For such situations I build the alert messages with variables for the information I need. i.e. Machine name, disk space, remaining disk space, etc. Those messages can be triggered by any alert that you want. So I would build an alert for the generic stuff such any drive with less than 10% space remaining and point it to those messages. And then build individual alerts for the specific machines and still point them to the same alert message - the body of the message will reflect what you need to see.

0 Kudos

I have done this recently using bits of what people of said above using custom propertys, creating a dashboard using swsql rather than reports as a starting point.

Then moving on to alerts when we were happy we were not going to get thousands of alerts. Then only alerted on critial and used wallboard to keep an eye on warning.

I liked the live icon changes, has more of impact ( and used links where ever i could the node name is a link too).

pastedImage_0.png

and something i dont see often but im going to do it here is the swsql script i created with some help from our dba team. So the Custom Property is the only thing i used so is it live is it test. But this means you can make multiple blocks

pastedImage_1.png

Can you bless us with text version of the code?

Sorry think i went into block overdrive see below

select n.displayname as nodename, substring (v.caption,1,3) as [caption],

CASE

when (v.volumesize/1024/1024/1024/1024) > 1 then (tostring(round(v.volumesize/1024/1024/1024/1024,2))+ ' TB')

when (v.volumesize/1024/1024/1024) > 1 then (tostring(round(v.volumesize/1024/1024/1024,2))+ ' GB')

when (v.volumesize/1024/1024) > 1 then (tostring(round(v.volumesize/1024/1024,2))+ ' MB')

end as [Volume Size],

CASE

when (v.VolumeSpaceAvailable/1024/1024/1024/1024) > 1 then (tostring(round(v.VolumeSpaceAvailable/1024/1024/1024/1024,2))+ ' TB')

when (v.VolumeSpaceAvailable/1024/1024/1024) > 1 then (tostring(round(v.VolumeSpaceAvailable/1024/1024/1024,2))+ ' GB')

when (v.VolumeSpaceAvailable/1024/1024) > 1 then (tostring(round(v.VolumeSpaceAvailable/1024/1024,2))+ ' MB')

end as [Space On Volume],

round((v.volumespaceused*100)/v.volumesize,2) as [volume percent used], v.detailsurl as [_linkfor_caption], n.detailsurl as [_linkfor_nodename],

case

when v.forecastcapacity.currentvalue > v.forecastcapacity.criticalthreshold then '/Orion/images/StatusIcons/Small-Critical.gif'

when v.forecastcapacity.currentvalue > v.forecastcapacity.warningthreshold then '/Orion/images/StatusIcons/Small-Warning.gif'

else '/Orion/images/StatusIcons/Small-Up.gif' end as [_iconfor_caption], '/Orion/images/StatusIcons/small-' + n.statusicon as [_iconfor_nodename]

from Orion.Nodes n

join Orion.NodesCustomProperties cp on n.nodeid = cp.nodeid

join Orion.Volumes v on n.nodeid = v.nodeid

Join Orion.VolumesCustomProperties vp on v.volumeid = vp.volumeid

where

(environment in ('live','dlive','dpre-prod','dr','dsit'))

and v.forecastcapacity.currentvalue  >  v.forecastcapacity.warningthreshold

and MonitorCapacity = 'A: True' and v.volumesize>0 and caption like '_:\%'

order by [volume percent used] desc

Hi Dodo, apologies but I have no idea what has been suggested above, you're clearly an expert on this product 🙂 I can't even get my head around the simple GUI options.....

Thanks for the vote of confidence but my role is basically SolarWinds I don't touch anything else. So I have had a lot of hours with a system that was put in and no one was using. So everything I touch at the minute I get lots of time to think about it.

i basically turned alerting off if you have the confidence that lots of your alerts are noise and no one is doing anything with them turn them off. I serpent ages turning out of the box alerts of. Getting rid of alert central, just taking ownership of it helped a lot.

honestly I would start with the dashboard just go to settings and views create a summary view and then add a custom query and copy the text above. You can remove the environment bit out of the script and the monitor capacity bit and it should work.

if you don't know sql you won't know swsql. I didn't so I arranged some training with our dba to get the basics and I turn to them for the complicated stuff then I reuse there code for other things.

Dont get me wrong I just love this product, I do not know everything I'm learning all the time this forum i have been in everyday since I started this role. I want that table in the thwack store  

Happy to walk you through what I have done so far and help you, the more people that use this product the better for me  

Hi Dodo

I'm contracted in for a few months to tidy up SW for the client who rely heavily on Alerts and at the moment all Support Teams and Individuals are receiving far too many alerts and false negatives. They're in the Finance sector and are a 24/7 operation with over 3,000 servers and hosted applications to stay on top of, and don't think I'll get away with no Alerts. They're monitoring Web Servers and Transactions via WPM, Applications via SAM and have multiple Support Teams working around the clock so Alerts are a must for them I'm afraid.

Just a couple of questions if I may.

Is there an easy way to create the 4 alerts I'm trying to configure to alert against free disk space without using the percentage flag, such as specifying the bytes?

And, why won't the alerts I've created flag any of our servers/volumes with Free Space triggered under 1GB, 3GB, and/or 5GB?

I am receiving some alert emails but they're not right, as in, they're alerts on disks that are no longer there, and also reporting on Page File volumes that I have specifically excluded in the Trigger Condition.

Thanks once again for your help and time.

0 Kudos

That seems pretty alien to me, is the business asking for those exact alerts.

As I have worked in Infrastructure (3rd line) and Support (2nd line) and that seems very granular and very close to the edge

Now the way i have done it is not 100% and theres room for improvement. but..

I use the percentages as they are built into the system plus this allows other staff to change those thresholds without affecting the alert or distracting me

So i have 1 alert currently that fires when it hits critical, i have set the default critical across all systems to 95% after agreeing with Infrastructure Manager (but these can be individually set on each volume)

Now i know with percentages if you have a big drive 99% full could still mean another 300gb free so I have yet to decide whether to adjust the alert to include them or set individual ones for these big drives. But thats where the dashboard comes into play. Plus the alert (the email that goes out) has the actual figures and links to instructions for what they can do next.

pastedImage_2.png

Actual alert

Subject

Alert Disk Space ${SQL: SELECT substring('${N=SwisEntity;M=Volume.Caption}',1,3)} on ${N=SwisEntity;M=Volume.Node.Caption}

Message

<head><style><font face="calibri" font size"11"></style></head><body><b>Volume Details</b>

<table cellspacing="0" cellpadding="0" border="0"><tr><td width="300">Volume Name:</td><td><a href="${N=SwisEntity;M=Volume.DetailsUrl}">${N=SwisEntity;M=Volume.Caption}</a></td></tr><tr><td>Node Name:</td><td><a href="${N=SwisEntity;M=Volume.Node.DetailsUrl}">${N=SwisEntity;M=Volume.Node.Caption}</a></td></tr><tr><td>Node Environment:</td><td>${N=SwisEntity;M=Volume.Node.CustomProperties.Environment}</td></tr><tr><td>Percent Available:</td><td>${N=SwisEntity;M=Volume.VolumePercentAvailable}</td></tr><tr><td>Space Available:</td><td>${N=SwisEntity;M=Volume.VolumeSpaceAvailable}</td></tr><tr><td>Volume Size:</td><td>${N=SwisEntity;M=Volume.VolumeSize}</td></tr></table>
<b>Alert Information</b>

<table cellspacing="0" cellpadding="0" border="0"><tr><td width="300">View full alert details</td><td><a href="${N=Alerting;M=AlertDetailsUrl}">Solarwinds Alert</a></td></tr><tr><td>Alert triggered</td><td>${N=Alerting;M=AlertTriggerTime;F=DateTime}</td></tr></table>
<b>Instructions</b>

<table cellspacing="0" cellpadding="0" border="0"><tr><td width="300">Disable monitoring instructions</td><td><a href="http://servicedesk/AddSolution.do?submitaction=viewsolution&fromListView=true&solutionID=304">Solution</a></td></tr><tr><td>Change node environment instructions</td><td><a href="http://servicedesk/AddSolution.do?submitaction=viewsolution&fromListView=true&solutionID=305">Solution</a></td></tr><tr><td>Change critical alert on volume</td><td><a href="http://servicedesk/AddSolution.do?submitaction=viewsolution&fromListView=true&solutionID=306">Solution</a></td></tr></table>
If percent available > 1% but space available is large, assign ticket to Network team and state alert amount.

<b>Alert name</b>

${N=Alerting;M=AlertName}
</font></body></html>

Looks like this

pastedImage_0.png

I did have another custom property regarding whether i want to monitor the drive or not, for me this should be built into the system. As dont have the option to have a default custom property (this is a feature request somewhere) used "A: True" & "B: False" to move true to the top of the list, but after mentioning this to my DBA he came back with should of used affirmative

0 Kudos
Level 8

I am also in need of this.  I have multiple SQL servers with very large volumes that I need an alert when these volumes fall below 5gig of free space. 

0 Kudos
Level 9

This is how I do it, to combine percentage and available disk space. You could drop the percentage entirely, since it might be redundant depending on disk sizes.

pastedImage_0.png

I also do this for the automatic reset condition as well to prevent reset happening too late, with larger disks.

pastedImage_2.png

Hope this helps you

View solution in original post

@husum182 

Hi, i am playing around with this and am wondering how i would get the alert not to fire on volumes that are "system reserved" since I am not concerned with disk space on those. 

I am seeing this in the label for the volume, but cannot find the appropriate variable to exclude as a condition. 

Does that make sense?

0 Kudos

Do this! This is what we use as well across a few hundred servers.

We also have some that report a bit earlier based on size of the volume as well.
It works well

0 Kudos

If you want a single notification for a group of servers, it sounds like you would be better off using a report instead of an alert.  Depending on how often your nodes use disk space will determine how often you schedule the report to run.  If you use an alert, it will trigger every time a single server free space changes.

0 Kudos