IT admins are constantly being challenged to do more with less in their production environments. An example is performance optimization of an n-tier virtualized application stack while minimizing cost. IT admins need to deliver the best possible quality of service (QoS) and return on investment (ROI) with limited time and resources.

 

Plan on Optimal Performance: A 6-Step Framework

The first step starts with a plan that needs to produce consistent and repeatable results while maintaining cost-efficiency at scale. One approach is to:

    1. Establish a baseline measurement for the application’s performance with accompanying performance data.
    2. Monitor and log any changes and key performance counters over time.
    3. Define data-driven criteria for good as well as bad, worse, and critical.
    4. Create alerts for bad, worse, and critical.
    5. Integrate a feedback loop with fixes for the degraded states.
    6. Repeat steps 1-5.

This 6-step framework provides a disciplined methodology to troubleshoot performance bottlenecks like disk contention and noisy-neighbors.


Back to Good From Contention and Noisy-Neighbors

Virtualization performance issues usually involve the storage infrastructure and announce their presence in the form of degraded application performance. Storage performance issues could stem from disk contention and noisy neighbor virtual machines (VMs). Disk contention is when multiple VMs try to simultaneous access the same disk leading to high response times and potential application timeouts. On the other hand, noisy-neighbor VMs periodically monopolize storage IO resources to the detriment of any other VMs on the shared storage.

 

Leveraging the framework, any application slowness or abnormalities should generate an alert based on triggers like high disk latencies (avg. sec per R/W), high application response times (millisecond), low disk throughput (Bytes per second) and high CPU & memory utilization. Next, the environment should be examined from a current top-down view of the resources and applications. That way, the degraded state can be compared to the known good state. Afterward, a drill-down should be done on the specific application or storage subsystem.

 

If the bottleneck is disk contention, there will be high IOPs and high response times on a disk. If the bottleneck is from noisy-neighbor VMs, those VMs will have high IO metrics (IOPs, bandwidth) while other VMs on the shared storage will be starved with low IO metrics. Once the issue is identified, counter and preventative measures can be taken.


Three Tips for Contention and Noisy-Neighbors

Tip #1: As a general rule of thumb, RAID 5 can sustain 150 IOPs per spindle in its group and RAID 10 can sustain 200 IOPs per spindle in its group. So distribute all of the VMs’ IOPs across the RAID groups according to these rules to avoid disk contention.

Tip #2: If the disk contention is occurring on a VMFS datastore, an IT Admin can adjust the disk shares for all VMs accessing the datastore from the same ESXi host and/or move some of the VM’s from the VMFS datastore to another datastore on a different LUN to resolve the contention.           

Tip #3: To address noisy-neighbor VMs, IOPs or bandwidth restrictions can be applied to the VMs via features like VMware’s SDRS and SIOC or Storage Quality of Service for Hyper-V.


Closing

A plan in hand coupled with tools like SolarWinds Server & Application Monitor and Virtualization Manager provides an IT admin a complete solution to manage, optimize, and troubleshoot virtualized application performance through its entire lifecycle. Let me know what you think in the Comments section. Plus, join the Community conversations on thwack.