Orion and the modules which run atop the platform provide a tremendous wealth of statistical information at your fingertips for spotting trends and hotspots. That data collected is also helpful for determining if what you're seeing now is anomalous, or normal consistent behavior based upon historical analysis. Unfortunately, one area where Orion hasn't been quite as strong is helping users troubleshoot active ongoing issues. Should you find yourself in the throes a major outage or performance issue, Orion does an outstanding job of ensuring you're alerted to the problem at hand. Where it falls short however, is providing tools which aid in your ability to diagnose the root cause of the issue in real-time.
As many of you are keenly aware, default polling intervals for statistic data collection in Orion is typically somewhere between 5-10 minutes for most Orion product modules. While this normal polling interval for statistic collection is perfectly reasonable for trend analysis, alerting, and reporting, it's less than ideal when you're actively troubleshooting an ongoing issue. Ideally, you'd want the ability to make change, like restarting a Windows service or Linux daemon, change a CBQoS policy, or allocate additional resources to a virtual machine, and then see immediately the impact those changes are having to the issue you're trying to resolve. In these situations, it's simply untenable to wait 5-10 minutes for Orion's next polling cycle to determine if what changes you made resolved the issue. Doing so significantly bottlenecks the number of things you can try, and extends the duration of the outage as you wait for Orion's next scheduled polling interval to determine if the issue is resolved.
Sure, there are alternatives and workarounds which many people leverage in these situations. Some choose to click the 'Poll Now' button feverishly to get updated values ahead of the normal 5-10 minute polling interval, but even this takes a minute or so before data is collected and visible within the Orion web interface. While better, this is still less than optimal for troubleshooting purposes. Others instead, use different tools like command line interfaces on switches, routers and linux, or Resource Monitor and Task Manager on WIndows for their firefighting needs. These tools though, have their own drawbacks, such as requiring you leave Orion where you were initially alerted to the problem, and console into the device exhibiting the issue. If this problem potentially spans multiple devices, such as in the case of distributed application architectures, clustered or load balanced servers, HSRP, VRRP, etc. then you'll be forced to juggle multiple console sessions with no ability to compare or correlate metrics between devices.
With the release of PerfStack included in Orion Platform 2017.3, these woes are a thing of the past. No more juggling between different tools as your boss watches over your shoulder, breathing down your neck as you scramble to isolate the cause of your next critical performance issue. With our new improvements to PerfStack, we introduce you to real-time polling, which provides up to one second statistic collection granularity when activated. This can be for a single entity like a node, or even multiple disparate entities simultaneously.
Start Real-Time Polling
To begin using PerfStack's new Real-Time Polling capabilities, start a new project and add a node by clicking 'Add Entities'. Expand the 'Node' category and click on the node you just added in the previous step to select it. This will populate the metric pallet with the list of all available metrics for that entity. Within the metric palette, expand 'CPU/Memory' or 'Response Time' and you will notice a blue rocketship icon which adorns many of the available metrics listed. This icon denotes that the metric is available for Real-Time Polling. Note that not all metrics for a given entity are pollable in real-time. A full listing of all real-time pollable metrics can be found by expanding the 'Real-Time Polling' category in the metric palette of the selected entity.
Once you've identified which real-time metrics you'd like to visualize within your PerfStack project, drag and drop those metric tiles into the chart area the same as you would any other metric. You can of course include both real-time and non-real-time metrics within the same project, but only those denoted with the blue rocket ship icon will be updated within the chart at one second intervals. Other metrics included within the same project will continue to update themselves based upon their normal scheduled polling intervals.
Now that you've added the some real-time metrics to your PerfStack project, simply click the 'Start Real-Time Polling' icon in the top action bar. This will automatically change the timeframe of the chart to the last 10 minutes. This allows you to more easily visualize variations in the charted values at high frequency polling intervals. You may also notice the rocketships blink when real-time polling is starting. This process takes just a second or two, then the charts begin to move. To stop real-time polling, simply click the 'Stop Real-Time Polling" button in the top action bar.
Real-Time Polling Limits
While real-time polling is active, you can continue to add or remove additional real-time metrics to your project. These can be from the same, or entirely different entities. Real-Time polling will continue for those existing metrics on the chart, and any newly added metrics will begin to update in real-time. There is a limit of ten unique real-time metrics per-project which can be polled. Should you exceed this limit, you will notice a toast message appears in the top right of the window when attempting to add the eleventh metric to a chart where real-time polling is enabled. This same message will appear if your project contains more than 10 real-time pollable metrics and you attempt to enable Real-Time Polling. To resume real-time polling, reduce the number of metrics which can be polled in real-time within your PerfStack project to ten or fewer.
|Session Limit||Global Limit|
In addition to the per-session limit of 10 real-time metrics, there is also a notification if you exceed a global limit of thirty unique metrics across all web interface sessions on the Orion server. Real-time polling uses a shared cache across all sessions, so if you and three of your colleagues are viewing the same ten metrics in real-time within PerfStack this only counts as 10 real-time metrics, not 40. This is because PerfStack is only polling the device in real-time once, and not for each unique user session. This helps reduce overhead on the Orion server, as well as any strain on the monitored device.
In our ever enduring commitment to remain an agentless first monitoring solution, Real-Time Polling in this release is available only for nodes managed via ICMP, SNMP, or WMI. Those nodes which are managed via the Orion Agent cannot as yet utilize Real-Time Polling. Should you select an entity within PerfStack that is managed via the Agent, you will notice the absence of any blue rocket ship icons in that entities metric tiles, denoting that Real-Time Polling is not available for that entity.
Metrics and Entities Supported
As stated above, Real-Time Polling is not yet available for all metrics and entity types. For this release of PerfStack, we focused on what we believe to be the most vital real-time metrics users would need at hand during a firefight. This includes 34 metrics spanning across three different entity types, nodes, interfaces, and volumes; allowing you to troubleshoot the most common network, storage, and device related performance issues in real-time from a single, centralized, web based interface. If you'd like to see additional real-time metrics supported in future releases, we'd love to know which ones you would find most valuable, and how you would plan to use them.
|Average CPU Load||Availability||Average Disk Queue Length|
|Average Memory Used||Received Discards||Average Disk Reads|
|Average Percent Memory Used||Received Errors||Average Disk Transfer|
|Peak CPU Load||Transmit Discards||Average Disk Writes|
|Peak Memory Used||Transmit Errors||Maximum Disk Queue Length|
|Minimum CPU Used||Average Receive bps||Maximum Disk Reads|
|Minimum Memory Used||Minimum Receive bps||Maximum Disk Transfer|
|Average Response Time||Peak Receive bps||Minimum Disk Queue Length|
|Maximum Response Time||Receive Percent Utilization||Minimum Disk Reads|
|Minimum Response Time||Average Transmit bps||Minimum Disk Transfer|
|Minimum Transmit bps||Minimum Disk Writes|
|Peak Transmit bps|
|Transmit Percent Utilization|
We at SolarWinds, understand that not all Orion administrators may want every user to have access to such an amazing feature. After all, they may be completely mesmerized by the screen and not get any actual work done as a result. With that in mind, you will find a new user or group level permission which controls whether the 'Real-Time Polling' button appears within PerfStack for those users. This new setting can be found under [Settings -> All Settings -> Manage Accounts]. From there, select a group or individual user account and click 'Edit'. Expand 'Performance AnalysIs Settings' at the bottom of the page and change this setting from 'Allow' to "Disallow' for any user or group. This will disable Real-Time Polling for those users. By default, all users have permission to launch Real-Time Polling within PerfStack.
Real-Time Polling is only one of the latest improvements we've made to PerfStack in the Orion Platform 2017.3 release. If you're interested in what other goodies we've stuffed under the hood, hop on over to my earlier post, entitled Orion Platform 2017.3 - PerfStack New Features & Improvements for the full rundown.