PerfStack - Real-Time Polling - Because Time is The Only Constant

Orion and the modules which run atop the platform provide a tremendous wealth of statistical information at your fingertips for spotting trends and hotspots. That data collected is also helpful for determining if what you're seeing now is anomalous, or normal consistent behavior based upon historical analysis. Unfortunately, one area where Orion hasn't been quite as strong is helping users troubleshoot active ongoing issues. Should you find yourself in the throes a major outage or performance issue, Orion does an outstanding job of ensuring you're alerted to the problem at hand. Where it falls short however, is providing tools which aid in your ability to diagnose the root cause of the issue in real-time.

As many of you are keenly aware, default polling intervals for statistic data collection in Orion is typically somewhere between 5-10 minutes for most Orion product modules. While this normal polling interval for statistic collection is perfectly reasonable for trend analysis, alerting, and reporting, it's less than ideal when you're actively troubleshooting an ongoing issue. Ideally, you'd want the ability to make change, like restarting a Windows service or Linux daemon, change a CBQoS policy, or allocate additional resources to a virtual machine, and then see immediately the impact those changes are having to the issue you're trying to resolve. In these situations, it's simply untenable to wait 5-10 minutes for Orion's next polling cycle to determine if what changes you made resolved the issue. Doing so significantly bottlenecks the number of things you can try, and extends the duration of the outage as you wait for Orion's next scheduled polling interval to determine if the issue is resolved.

Sure, there are alternatives and workarounds which many people leverage in these situations. Some choose to click the 'Poll Now' button feverishly to get updated values ahead of the normal 5-10 minute polling interval, but even this takes a minute or so before data is collected and visible within the Orion web interface. While better, this is still less than optimal for troubleshooting purposes. Others instead, use different tools like command line interfaces on switches, routers and linux, or Resource Monitor and Task Manager on WIndows for their firefighting needs. These tools though, have their own drawbacks, such as requiring you leave Orion where you were initially alerted to the problem, and console into the device exhibiting the issue. If this problem potentially spans multiple devices, such as in the case of distributed application architectures, clustered or load balanced servers, HSRP, VRRP, etc. then you'll be forced to juggle multiple console sessions with no ability to compare or correlate metrics between devices.

Enter PerfStack

With the release of PerfStack included in Orion Platform 2017.3, these woes are a thing of the past. No more juggling between different tools as your boss watches over your shoulder, breathing down your neck as you scramble to isolate the cause of your next critical performance issue. With our new improvements to PerfStack, we introduce you to real-time polling, which provides up to one second statistic collection granularity when activated. This can be for a single entity like a node, or even multiple disparate entities simultaneously.

PerfStack-Real-Time-Polling.gif

Start Real-Time Polling

To begin using PerfStack's new Real-Time Polling capabilities, start a new project and add a node by clicking 'Add Entities'. Expand the 'Node' category and click on the node you just added in the previous step to select it. This will populate the metric pallet with the list of all available metrics for that entity. Within the metric palette, expand 'CPU/Memory' or 'Response Time' and you will notice a blue rocketship icon which adorns many of the available metrics listed. This icon denotes that the metric is available for Real-Time Polling. Note that not all metrics for a given entity are pollable in real-time. A full listing of all real-time pollable metrics can be found by expanding the 'Real-Time Polling' category in the metric palette of the selected entity.

Rocket Ship.pngReal-Time Polling Category.png

Once you've identified which real-time metrics you'd like to visualize within your PerfStack project, drag and drop those metric tiles into the chart area the same as you would any other metric. You can of course include both real-time and non-real-time metrics within the same project, but only those denoted with the blue rocket ship icon will be updated within the chart at one second intervals. Other metrics included within the same project will continue to update themselves based upon their normal scheduled polling intervals.

Now that you've added the some real-time metrics to your PerfStack project, simply click the 'Start Real-Time Polling' icon in the top action bar. This will automatically change the timeframe of the chart to the last 10 minutes. This allows you to more easily visualize variations in the charted values at high frequency polling intervals. You may also notice the rocketships blink when real-time polling is starting. This process takes just a second or two, then the charts begin to move. To stop real-time polling, simply click the 'Stop Real-Time Polling" button in the top action bar.

Start-Real-Time-Polling.gif

Real-Time Polling Limits

While real-time polling is active, you can continue to add or remove additional real-time metrics to your project. These can be from the same, or entirely different entities. Real-Time polling will continue for those existing metrics on the chart, and any newly added metrics will begin to update in real-time. There is a limit of ten unique real-time metrics per-project which can be polled. Should you exceed this limit, you will notice a toast message appears in the top right of the window when attempting to add the eleventh metric to a chart where real-time polling is enabled. This same message will appear if your project contains more than 10 real-time pollable metrics and you attempt to enable Real-Time Polling. To resume real-time polling, reduce the number of metrics which can be polled in real-time within your PerfStack project to ten or fewer.

Session LimitGlobal Limit
PerfStack RealTime Exceeded.pngPerfStack Real-Time Global Limit.png

In addition to the per-session limit of 10 real-time metrics, there is also a notification if you exceed a global limit of thirty unique metrics across all web interface sessions on the Orion server. Real-time polling uses a shared cache across all sessions, so if you and three of your colleagues are viewing the same ten metrics in real-time within PerfStack this only counts as 10 real-time metrics, not 40. This is because PerfStack is only polling the device in real-time once, and not for each unique user session. This helps reduce overhead on the Orion server, as well as any strain on the monitored device.

Polling Methods

In our ever enduring commitment to remain an agentless first monitoring solution, Real-Time Polling in this release is available only for nodes managed via ICMP, SNMP, or WMI. Those nodes which are managed via the Orion Agent cannot as yet utilize Real-Time Polling. Should you select an entity within PerfStack that is managed via the Agent, you will notice the absence of any blue rocket ship icons in that entities metric tiles, denoting that Real-Time Polling is not available for that entity.

Metrics and Entities Supported

As stated above, Real-Time Polling is not yet available for all metrics and entity types. For this release of PerfStack, we focused on what we believe to be the most vital real-time metrics users would need at hand during a firefight. This includes 34 metrics spanning across three different entity types, nodes, interfaces, and volumes; allowing you to troubleshoot the most common network, storage, and device related performance issues in real-time from a single, centralized, web based interface. If you'd like to see additional real-time metrics supported in future releases, we'd love to know which ones you would find most valuable, and how you would plan to use them.

NodesInterfacesVolumes
Average CPU LoadAvailabilityAverage Disk Queue Length
Average Memory UsedReceived DiscardsAverage Disk Reads
Average Percent Memory UsedReceived ErrorsAverage Disk Transfer
Peak CPU LoadTransmit DiscardsAverage Disk Writes
Peak Memory UsedTransmit ErrorsMaximum Disk Queue Length
Minimum CPU UsedAverage Receive bpsMaximum Disk Reads
Minimum Memory UsedMinimum Receive bpsMaximum Disk Transfer
Average Response TimePeak Receive bpsMinimum Disk Queue Length
Maximum Response TimeReceive Percent UtilizationMinimum Disk Reads
Minimum Response TimeAverage Transmit bpsMinimum Disk Transfer
Minimum Transmit bpsMinimum Disk Writes
Peak Transmit bps
Transmit Percent Utilization

User Restrictions

We at SolarWinds, understand that not all Orion administrators may want every user to have access to such an amazing feature. After all, they may be completely mesmerized by the screen and not get any actual work done as a result. With that in mind, you will find a new user or group level permission which controls whether the 'Real-Time Polling' button appears within PerfStack for those users. This new setting can be found under [Settings -> All Settings -> Manage Accounts]. From there, select a group or individual user account and click 'Edit'. Expand 'Performance AnalysIs Settings' at the bottom of the page and change this setting from 'Allow' to "Disallow' for any user or group. This will disable Real-Time Polling for those users. By default, all users have permission to launch Real-Time Polling within PerfStack.

pastedImage_0.png

Real-Time Polling is only one of the latest improvements we've made to PerfStack in the Orion Platform 2017.3 release. If you're interested in what other goodies we've stuffed under the hood, hop on over to my earlier post, entitled Orion Platform 2017.3 - PerfStack New Features & Improvements for the full rundown.

Parents
  • Thanks for the answers, appreciate it.

    If you could share how it can be altered that would be great or do I need to raise a support case, I only want to up it to 12 and will drop it back down to 10 if we see any resource issues.  Is it just a config file change?

    Thanks,

    worto.

Comment
  • Thanks for the answers, appreciate it.

    If you could share how it can be altered that would be great or do I need to raise a support case, I only want to up it to 12 and will drop it back down to 10 if we see any resource issues.  Is it just a config file change?

    Thanks,

    worto.

Children
No Data
Thwack - Symbolize TM, R, and C