Tool: SAM
Module: net-snmp
OS: any *Nix, but in particular for the test, RHEL 6.4
Status:
- Server is currently reporting snmp data in Orion for cpu, memory, disk space, etc.
- Alert threshold in Orion has not been changed from default
Goal:
- when a file system goes past a threshold, send an e-mail alert to AdminA
- when a file system goes past a threshold, post an alert on the monitor page
- when a file system has transgressed the threshold X times, send an additional e-mail alert to AdminB (escalation)
- when the file system has gone back below the thresshold, unset the alerts, sending mails to AdminA (and AdminB, if one ever was sent) that the alert is now gone
- if the alert was reset in the monitor page, send an alert to AdminA (and AdminB, if one was ever sent), that logged-in-user reset the alert in the GUI
Issue:
- Goals not yet met since I'm stuck still on the alerting mechanism not firing properly. I don't get e-mail alerts to the target address, eventhough the conditions are easily met.
What I've been able to do:
1. log in to Orion and select 'manage alerts'
2. make a copy of 'Alert me when the free space of a volume is less than 5%' and renamed it <name> - UNIX
3. Summary of trigger:
[{ NOTE that thought the alert is named 5%...we're testing against 1% free to make sure it works...some data obfuscated in the below}]
Type of Property to monitor: Volume
Enabled(On/Off): ON
Evaluation Frequency of alert: Every minute
Severity of alert: Critical
Alert Custom Properties: (0) No Alert Custom Properties defined
Alert Limitation Category: No Limitation
Trigger Condition:
All child conditions must be satisfied (AND)
Node - System Name - is equal to - linuxhost.domain.com
Volume - Volume Percent Available - is less than - 99
At least one child condition must be satisfied (OR)
Node - Vendor - is equal to - Data General
Node - Vendor - is equal to - HP
Node - Vendor - is equal to - IBM
Node - Vendor - is equal to - net-snmp
At least one child condition must be satisfied (OR)
Volume - Volume Type - is equal to - Fixed Disk
Volume - Volume Type - is equal to - FixedDisk <--not sure why we need to select both for *Nix, but they are there...
Reset Condition: When the trigger condition is no longer true
Time of Day schedule: Alert is always enabled
Trigger Action: Escalation Level 1 Send an Email/Page (There is less than 5% available free space on volume ${FullName})
Reset Action: No reset action specified
Email message when simulated against '/' on linuxhost.domain.com appears as thus:
Email with following details would be sent...
To:
email@domain.com
Subject:
linuxhost.domain.com-/ < 5% space available
Message:
Volume linuxhost.domain.com-/:
Mount Point: /
Total size 11.7 G
Free space 5.6 G
Percent used 52 %
Link to the volume details page for more information: http://ORIONNODE:85/Orion/View.aspx?NetObject=V:96
Volume Type: Fixed Disk
Volume Name: linuxhost.domain.com-/
VolumeDescription:/
DisplayName: /
More Details:
Trigger Time: Wednesday, September 16, 2015 10:21 AM
Severity: Critical
Escalation: Action will execute immediately after alert trigger
Alert Definition: Alert me when the free space of a volume is less than 5% - UNIX
Acknowledged by:
4. However, nothing ever gets sent, even though the condition is met all day long....all month long...