cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Principal enemy of virtualized environments - latency. How to "fight" it?

Level 9

Latency is the principal enemy of an administrator. If your virtual infrastructure running smoothly and latency is at acceptable level then everything is fine, but if latency at the storage side or on the network goes through the ceiling, then you're in trouble. You might be also running very sensitive - from latency perspective - applications.

Concerning network latency, there can be a situation where you can tweak few things not only on the virtual infrastructure, but also on physical infrastructure. If the packets traveling from one VM living on one ESXi host, to another VM living on another ESXi host, at one moment they cross a physical switch. Few improvements can be achieved on the physical switch.

How to "fight" latency and which best fine tuning the admin can implement for very latency sensitive VMs?

  • Bios Settings - often new servers with newer CPUs from intel or AMD can be set to power saving, but High power is the only way to go! C-State increases latency too, even if saving energy, so if you're using latency sensitive application and you are chasing every single microsecond of latency in your environment, then you need to disable C-state. Also make sure that vSphere (ESXi) is set to high performance. Depending of your hardware manufacturer, you should check its documentation for best possible performance settings too.

ballanced.jpg

  • NUMA - Processor affinity for vCPU should be shedulled on specific NUMA nodes and also memory affinity for all VM memory shall be allocated from those NUMA nodes. Example for vSphere you can enable it through Manage > Settings > Edit > Advanced > Edit configuration button >  where you'll need to Add new option "numa.nodeAffinity" and as a value you'll use coma sepparated list for multiple nodes. Depending of your virtualization platform, if it's vSphere, then it depends of your version. Check the documentation for "numa.nodeAffinity".

  • Physical and virtual NIC settings - It's possible to disable interrupt moderation on physical NICs, which will be beneficial to achieve lower latency for very sensitive, low-latency applications. The feature is also called interrupt throttling. It's been instated however to prevent the particular host to get overwhelmed with CPU cycles which only treats interrupts. However keep in mind that even if disabling interrupt moderation on physical NICs is benefical for very-latency-sensitive-VMs, it brings some CPU overhead on the host and can possibly affect other VMs running on the host from the performance perspective. Virtual NICs tweaks - here is important to choose the right vNIC type, like VMXNET3 which is now the default type for most Guest OS, but you should check it out, especially for applications and vMs that you really need the best possible performance. It's also possible to disable virtual interrupt coalescing on the vNIC (by adding "ethernetX.coalescingScheme" with value "disabled"). Host based settings is also possible however it will affect ALL virtual machines running on the particular host.

  • Virtual disk SCSI controller choice - depending of the OS you're using on particular VM(s) it can be useful to change from what's usually the default one (LSI Logic SAS) into VMware Paravirtual, which leads to lower CPU utilization and higher throughput (Roughly 7-10%). To deliver the same number of IOPS it uses less CPU cycles (10% lower CPU utilization). Now it's perhaps the time to use it at past vSphere 4.1 problems were already solved long time ago! Its the most effective driver, which has the most efficiency. Note that only 4 PVSCSI controllers per VM are currently supported.

  • Guest OS optimizations - what to say on GuestOS optimization. I've already mentioned the importance of vCPU and memory sizing. But there are also possible other tweaks - at the application level and guestOS level. Depending of the workloads you're running, you can find specific optimization guides for VDI deployments where you need to tweek the master image, deactivate some services etc, for server based OS, things are usually based on VM after VM optimization. But basically you can already start with the virtual hardware, where you can delete the unnecessary floppy drives, COM ports or USB ports. Then other important application which often uses some huge in guest resources is Java.
  • There is a configuration for using large memory pages. This has to be done on the Java side by adding a command line option when launching java (-XX: +UseLargePages) - see info here. Where on the GuestOS tweak is often called "Lock pages in Memory" and can be done for a VM or Group of VMs via GPO (see here). But VMware ESXi settings are beneficial but do have some drawbacks too, considering that there are other optimization techniques involved as well.

Wrap up:

In few articles I tried to help other admins, users of virtualization technologies, to make the most benefit of their virtual infrastructures. The posts can be found here:

20 Comments
MVP
MVP

Very good info....while not being on the VM admin side myself, the Java information is very useful as we monitor a number of java environments in production and it will give us something else to consider when we see performance issues.

Level 10

This is an awesome article! I next to never come across someone willing to talk about NUMA. A lot of VMware admins think that it is extinct, or no longer viable to our environments. I disagree completely. NUMA, and vNUMA are completely relevant. When you are trying to get every little piece of performance out of your environment.

The BIOS settings alone, a large number of IT professionals never even think about. That caught a couple people I work with about 6 months ago. Darn power savings mode.......

I was completely unaware of the ability to disable virtual interrupt coalescing on the vNIC. I will have to research that more now!

Excellent, this is very beneficial!

Thank you very much!

Level 12

Great topics vladan. Thanks

Level 12

vladan Awesome article i do a little work with the VM guys here so this is a great insight on some latency issues we may see.

Level 9

If you wanted to do a whole series of articles about virtualization, I wouldn't complain.

Level 12

vladan, nice article and very helpful info.

Level 10

I am reading these articles and then speaking with my virtualization crew here and learning more every day. Thanks!

Level 9

Awesome article with lots of great information.

MVP
MVP

Great article. It helps to understand your virtual environment compared to your application environment. Good Communication between the Infrastructure team and application development team helps a lot in understanding what needs to be setup on the virtual machines. Or what will help each environment.

Level 11

Thank for the great stuff..

Level 9

Thanks for this info! Great article!

Level 11

Good info for sure.

Level 12

This is great information.  Especially, as cbussard@idealintegrations.net mentioned, for the server power saving settings nugget.  I am shocked the number of clients that I visit that either don't know they exist or simply stay with the default settings.

Level 21

Optimizing a virtual environment can be a real challenge.  Each hypervisor, network arrangement, storage system, etc has it's own set of requirements and tweaks necessary for optimization as well.  It's important to have specialized skillets in each of those areas as part of an overall team to maximize the efficiency of your environment.

Great article!

Level 10

Greate article. It's one thing to understand how virtualization work and it's another thing to be able to manage the virtual environment you claim to understand. Although the article seems to focus more on VMWare what about other virtualization technologies? I believe the most if not all of these principles applies to them. Thank you a great deal it is really really informative.

Level 10

Very good info. Thank you so much for sharing this.

Level 17

Awesome Help!

Level 10

Great information, thanks for the detailed post.

Level 10

I think there are some great pointers in this article, but it skips over the #1 source of high latency, which is storage. Whereas the latency metric for most of the areas listed is in microseconds, occasionally milliseconds, storage latency issues start with milliseconds. Saving a small amount of performance with a BIOS setting is largely irrelevant if the backing storage is delivering the necessary performance--that's where you'll see the real delay. Not to say that those other areas aren't relevant; sometimes you have to dig deep to get to the root of an unexpected application-specific performance issue in a VM. It just seems odd to leave out the top performance killer entirely.

Level 15

Awesome post containing useful information.  THanks!

About the Author
Virtualization blogger and IT engineer, living at Reunion Island (fr). Trying to help others with their journey to all virtual...