More often then not application owners look to their vendors to provide a list of requirements for a new project, the vendor forwards specifications that were developed around maximum load and in the age of physical servers. These requirements eventually make their way on to the virtualization administrators desk. 8 Intel CPUs 2Ghz or higher, 16 GB Memory. The virtualization administrator is left to fight the good fight with both the app owner and the vendors. Now we all cringe when we see requirements such as above – we’ve worked hard to build out our virtualization clusters, pooling CPU and Memory to present back to our organizations, and we constantly monitor our environments to ensure that resources are available to our applications when they need them – and then a list of requirements like above come across through a server build form or email.
So whats the big deal?
We have lots of resources right?!? Why not just give the people what they want? Before you just start giving away resources let’s take a look at both CPU and Memory and see just how VMware handles its scheduling and over commitment of both..
One of the biggest selling points of virtualization is the fact that we can have more vCPUs attached to VMs then we have physical CPUs on our hosts and let the ESXi scheduler take care of scheduling the VMs time on CPU. So why don’t we just go ahead and give every VM 4 or 8 vCPUs? You would think granting a VM more vCPUs would increase its performance – which it most certainly could – but the problem is you can actually hurt performance as well – not just on that single VM, but on other VMs running on the host as well. Since the physical CPUs are shared there may be times where the scheduler will have to place CPU instructions on hold, or wait for physical cores to become available. For instance, a VM with 4 vCPU’s will have to wait until 4 physical cores are available before the scheduler and execute its’ instructions, where-as a VM with 1 vCPU only has to wait for 1 logical core. As you can tell, having multiple VMs each containing multiple vCPUs could in fact end up with a lot of queuing, waiting, and CPU ready time on the host, resulting in a significant impact to our performance. Although VMware has made strides in CPU scheduling by implementing “Relaxed Co-scheduling”, it still only allows for a certain time drift between the execution of instructions across cores, and does not completely solve the issues around scheduling and CPU Ready – It’s always best practice to right size your VMs in terms of number of vCPUs to avoid as many scheduling conflicts as possible.
vSphere deploys many techniques when managing our virtual machine memory – VMs can share memory pages with each other, eliminating redundant copies of the same memory pages. vSphere can also compress memory as well as deploy ballooning techniques which will allow on VM to essentially borrow allocated memory to another. This built in intelligence almost masks away any performance or issues we might see with overcommitting RAM to our virtual machines. That said memory is often one of the resources we run out of first and we should still take precautions in order to right-size our VMs to prevent waste. The first thing to consider is overhead – by assigning additional un-needed memory to our VMs we increase the amount of overhead memory that is utilized by the hypervisor in order to run the virtual machine, which in turns takes memory from our pool available to other VMs. The amount of overhead is determined by the amount of assigned memory as well as the number of vCPUs on the VM, and although this number is small (roughly 150MB for 16GB RAM/2vCPU VM) it can begin to add up as our consolidation ratios increase. Aside from memory waste overcommitted memory also causes unnecessary waste on our storage end of things as well. Each time a VM is powered on a swap file is created on disk of equal size to the allocated memory. Again, this may not seem like a lot of wasted space at the time but as we create more and more VMs it can certainly add up to quite a bit of capacity. Keep in mind that if there is not enough free space available to create this swap file, the VM will not be able to be powered on.
Certainly these are not the only impacts that oversized virtual machines have on our environment. They can also impact certain features such as HA, vMotion times, DRS actions, etc but these are some of the bigger ones. Right-sizing is not something that’s done once either – it’s important to constantly monitor your infrastructure and go back and forth with application owners as things change from day to day, month to month. Certainly there are a lot of applications and monitoring systems out there that can perform this analysis for us so use them!
All that said though discovering our over and under sized VMs within our infrastructure is probably the easiest leg of the journey of reclamation. Once we have some solid numbers and metrics in hands we need to somehow present these to other business units and application owners to try and claw back resources – this is where the real challenge begins. Placing a dollar figure on everything, and utilizing features such as showback and chargeback may help, but again, its hard to take something back after its been given. So my questions to leave you with this time are as follows – First up, how do you ensure your VMs are right-sized? Do you use a monitoring solution available today to do so? And if so, how often are you evaluating your infrastructure for right-sized VMs (Monthly, Yearly, etc.)? Secondly, what do you for see as the biggest challenges in trying to claw back resources from your business? Do you find that it’s more of a political challenge or simply an educational challenge?