Latency is the principal enemy of an administrator. If your virtual infrastructure running smoothly and latency is at acceptable level then everything is fine, but if latency at the storage side or on the network goes through the ceiling, then you're in trouble. You might be also running very sensitive - from latency perspective - applications.


Concerning network latency, there can be a situation where you can tweak few things not only on the virtual infrastructure, but also on physical infrastructure. If the packets traveling from one VM living on one ESXi host, to another VM living on another ESXi host, at one moment they cross a physical switch. Few improvements can be achieved on the physical switch.


How to "fight" latency and which best fine tuning the admin can implement for very latency sensitive VMs?


  • Bios Settings - often new servers with newer CPUs from intel or AMD can be set to power saving, but High power is the only way to go! C-State increases latency too, even if saving energy, so if you're using latency sensitive application and you are chasing every single microsecond of latency in your environment, then you need to disable C-state. Also make sure that vSphere (ESXi) is set to high performance. Depending of your hardware manufacturer, you should check its documentation for best possible performance settings too.


  • NUMA - Processor affinity for vCPU should be shedulled on specific NUMA nodes and also memory affinity for all VM memory shall be allocated from those NUMA nodes. Example for vSphere you can enable it through Manage > Settings > Edit > Advanced > Edit configuration button >  where you'll need to Add new option "numa.nodeAffinity" and as a value you'll use coma sepparated list for multiple nodes. Depending of your virtualization platform, if it's vSphere, then it depends of your version. Check the documentation for "numa.nodeAffinity".


  • Physical and virtual NIC settings - It's possible to disable interrupt moderation on physical NICs, which will be beneficial to achieve lower latency for very sensitive, low-latency applications. The feature is also called interrupt throttling. It's been instated however to prevent the particular host to get overwhelmed with CPU cycles which only treats interrupts. However keep in mind that even if disabling interrupt moderation on physical NICs is benefical for very-latency-sensitive-VMs, it brings some CPU overhead on the host and can possibly affect other VMs running on the host from the performance perspective. Virtual NICs tweaks - here is important to choose the right vNIC type, like VMXNET3 which is now the default type for most Guest OS, but you should check it out, especially for applications and vMs that you really need the best possible performance. It's also possible to disable virtual interrupt coalescing on the vNIC (by adding "ethernetX.coalescingScheme" with value "disabled"). Host based settings is also possible however it will affect ALL virtual machines running on the particular host.


  • Virtual disk SCSI controller choice - depending of the OS you're using on particular VM(s) it can be useful to change from what's usually the default one (LSI Logic SAS) into VMware Paravirtual, which leads to lower CPU utilization and higher throughput (Roughly 7-10%). To deliver the same number of IOPS it uses less CPU cycles (10% lower CPU utilization). Now it's perhaps the time to use it at past vSphere 4.1 problems were already solved long time ago! Its the most effective driver, which has the most efficiency. Note that only 4 PVSCSI controllers per VM are currently supported.


  • Guest OS optimizations - what to say on GuestOS optimization. I've already mentioned the importance of vCPU and memory sizing. But there are also possible other tweaks - at the application level and guestOS level. Depending of the workloads you're running, you can find specific optimization guides for VDI deployments where you need to tweek the master image, deactivate some services etc, for server based OS, things are usually based on VM after VM optimization. But basically you can already start with the virtual hardware, where you can delete the unnecessary floppy drives, COM ports or USB ports. Then other important application which often uses some huge in guest resources is Java.
  • There is a configuration for using large memory pages. This has to be done on the Java side by adding a command line option when launching java (-XX: +UseLargePages) - see info here. Where on the GuestOS tweak is often called "Lock pages in Memory" and can be done for a VM or Group of VMs via GPO (see here). But VMware ESXi settings are beneficial but do have some drawbacks too, considering that there are other optimization techniques involved as well.


Wrap up:


In few articles I tried to help other admins, users of virtualization technologies, to make the most benefit of their virtual infrastructures. The posts can be found here: