Version 2

    Here are some changes that I made to my Orion alert engine to better handle unique alert thresholds per node. I wish these were already set like this out of the box.

     

    When you edit each node, at the bottom of the page there is a threshold section that allows you to change a single node's thresholds based on your needs. This makes sense in several cases where maybe the CPU at 50% in a server is critical or a remote server has a poor connection and 300ms response time is acceptable.

     

    This requires you to modify the respective alerts to NOT use a static value but to reference the value set per node instead.

     

    When you look at available alert variables there are several starting with Warning Value Reached.... or Critical Value Reached..., each with their respective category.

    2016-10-13 09_44_29-Program Manager.png

     

     

    What i did on each alert was remove the static value in each and replace them with the corresponding warning/critical value reached variables. This allows nodes to still received whatever is set for the global threshold values while modifying nodes on a one off basis and still leveraging the same alert.

     

    Response Time

    2016-10-13 09_19_21-Edit Alert - _ SMS_Slack Alert me of High Response Time (custom)_.png

     

    Interface Utilization: I added both receive and transmit values into the same alert. This was just my preference, be default there is one alert for receive and another for transmit.

    2016-10-13 09_20_31-Edit Alert - _ SMS_Slack Alert me of Interface issues (custom)_.png

     

    Packet Loss

    2016-10-13 09_21_25-Edit Alert - _ SMS_Slack Alert me of Packet Loss (custom)_.png

     

    CPU

    2016-10-13 09_22_46-Edit Alert - _ SMS_Slack alert me when CPU load has an issue (custom)_.png

     

    Memory

    2016-10-13 09_23_47-Edit Alert - _ SMS_Slack alert me when Memory load has an issue (custom)_.png

     

     

     

    I've put in a feature request for disk/volume alerting to also have warning/critical value reached. Those variables currently do no exist.

     

    If you've found this useful, please rate this article.