The time has come again for another exciting rundown of some of the improvements and enhancements coming your way in the next major installment of the Orion Platform. For those who may not be familiar, Orion is the foundational component upon which product modules such as Network Performance Monitor (NPM), Server & Application Monitor (SAM), and many others are built atop. Platform capabilities are available to, or can be leveraged by modules which run atop the Orion Platform. In most cases, those enhancements are available regardless of which Orion module(s) you are running, such as PerfStack. In others, it may be something which individual modules can extend to utilize for their own purposes, such as the Orion Agent which has been the basis for delivering amazing new capabilities from NetPath and QoE in NPM, to Application Dependency Mapping (ADM) and IaaS monitoring in SAM.
Several years ago I created a Universal Device Poller (UnDP) for monitoring APC SmartUPS devices, and still to this day it remains amongst one of the most popularly downloaded UnDPs for NPM, if not the most popular. Universal Device Pollers are an incredibly powerful feature of NPM, allowing you to monitor virtually anything about a device which is managed via SNMP. However, there comes a time when certain functionality becomes so ubiquitous that it makes sense to promote it to native functionality of the monitoring solution and not require users to create it themselves. So in this 2018.2 release of the Orion Platform included with NPM 12.3, that's precisely what we set out to do, while also making some improvements along the way.
If you haven't already done so, you'll want to start by adding your APC UPS equipment to Orion. You can do so individually using the 'Add Node Wizard' [Settings > All Settings > Add Node], or in bulk using Sonar Discovery [Settings > All Settings > Discover Network]. If you are adding the devices using the 'Add Node Wizard', you will notice a new option listed for your APC UPS equipment entitled 'UPS'. Checking the box next to this option will enable UPS polling for this device.
|List Resources||Power Control Unit Status Resource||UPS Firmware Version|
Once you've successfully completed the 'Add Node Wizard' and navigate to the 'Node Details' view of your newly added UPS device, you will notice a newly added resource entitled 'Power Control Unit Status'. This resource reflects the most important information about your UPS device, including things such as its overall status, time on battery, and the batteries current charge capacity. This information can, as you would expect, be utilized in Alerts to notify you things such as when the UPS is on Battery, if a battery needs replacing, or if the battery is reaching an unsafe operating temperature. You may also notice that the 'Software Version' field in the "Node Details' resource now accurately reflects the firmware version installed and running on the UPS.
Currently, this new capability is limited exclusively to APC (American Power Conversion) SmartUPS Uninterruptible Power Supplies (UPS) containing Network Management (AKA Web/SNMP) cards. This feature does not support APC's unmanaged BackUPS series, nor does it yet support other UPS vendors, such as Eaton, Tripp Lite, or CyberPower. At least for now, we recommend using the Universal Device Poller to monitor similar metrics for UPS vendors other than APC. We will, however, be keeping a close eye on the NPM feature request forum to gauge interest in native support for other UPS vendors.
Linux/Unix Load Average
In a similar vein to UPS monitoring discussed above, we learned from speaking with our customers over the years, as well as from those participating in the Orion Improvement Program, that monitoring Load Average on Linux and Unix systems ranks among the most popular uses of the Universal Device Poller. In our enduring pursuit to deliver unexpected simplicity to our customers, we realized that collecting these important metrics natively was something which was long overdue.
Beginning in Orion Platform 2018.2, and included with NPM 12.3, Load Average is collected automatically for any node which supports it. This is typically any Linux based operating system, but can also extend to FreeBSD, AIX, and other Unix like OS's. The Load Average metrics are collected for nodes monitored via the Orion Agent, as well as those managed agentlessly via SNMP. There's really no additional steps required if you added your nodes using the default selection. Since Load Average has a direct correlation to CPU utilization, it's intuitively tied to the existing 'CPU & Memory' option shown under 'List Resources'. When selected, Load Average statistics are collected automatically if the node being monitored supports them.
|List Resources - CPU & Memory||Load Average Resource|
On the 'Node Details' view of your Linux servers, you will notice a snazzy new resource entitled 'Load Average' which displays the one minute, five minute, and fifteen-minute load average of the machine being monitored. Because Load Average metrics are tightly coupled to the number of CPU cores, we extended Orion's alerting to allow you to combine Load Average statistics with CPU count within your Alert Trigger so you can be notified when your system is under strain.
Load Average has also been added to the default PerfStack metrics for the node, meaning if you click on the 'Performance Analysis' button on the "Management' Resource of the 'Node Details' view for Linux server, you'll be taken to PerfStack where these Load Average statistics are automatically prepopulated. Similarly. if you're already working in PerfStack you can drag the node itself onto the chart area, the Load Average statistics, as well as other default metrics for the node will populate the PerfStack dashboard.
Ever since bshopp introduced us to Orion Groups back in NPM 10.1, we've heard from many of you that the manner in which availability is calculated for these groups just didn't jive with how you think about availability in your environment, nor did it provide a valuable measurement for use in your SLAs. Sadly, Group Availability in Orion is calculated binarily. Put simply, the group is either 100% 'Up' or it's 100% 'Down' regardless of the number of members contained within the group. What this usually meant was, so long as at least one member in the group was 'Up', the availability of the group was 100%. That remained true even if there were 99 other things 'down' in that group at that time. I know, it sounds odd when you say it aloud or even when you're writing it down, but that's how it's been for years and somehow we've managed the muddle through. Well in this release of the Orion Platform, no longer will you be forced to just muddle through. Today we heed your cries!
Rather than turn the world on its end, causing lots of confusion and alerts storms in our wake, we left the legacy Group Availability metric in place, untouched. I know that will come as a big relief to those of you which have grown dependant upon this method of calculating availability and have built reports and alerts around this metric. What we chose to do instead is introduce a new Group metric entitled 'Group Members Availability', which as one would expect, properly and accurately calculates the availability of the group based on its members. This includes nested groups as well.
This new 'Group Members Availability' metric appears automatically on the 'Group Details' view upon group creation. We will also start calculating this new metric upon upgrade to Orion Platform 2018.2 if you already have existing groups. So there's really nothing you need to do. We even include a new out-of-the-box report we refer to as 'Members Based Group Availability Report - Last Month' which serves as an example for how easily this metric can be added to your own reports compared to some of thesome had attempted to use in the past. You can even leverage this new Group Members Availability metric in your alerting conditions with no fuss!
There's still plenty more we've managed to jam pack into this release of the Orion Platform that we're particularly excited about and would love to get your feedback on. Stay tuned to learn about some of the mapping improvements jblankjblank has whipped up and the many usability enhancements serena has crammed into this release, such as sexy new hovers, a new PerfStack widget, and additional improvements that we've made to ensure your next upgrade experience is great!