We had issues in our VM environment and I figured I could help others. When we configured the resources we did not account for NUMA.
The high latency of accessing remote memory in NUMA (Non-Uniform Memory Access) architecture servers can add a non-trivial amount of latency to application performance. ESXi uses a sophisticated, NUMA-aware
scheduler to dynamically balance processor load and memory locality.
For best performance of latency-sensitive applications in guest OSes, all vCPUs should be scheduled on the same NUMA node and all VM memory should fit and be allocated out of the local physical memory attached to that NUMA node.
https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/vmw-tuning-latency-sensitive-workloads-white-paper.pdf
This caused extreme slowness and latency in our environment. We had the Pollers VM's set at 4 CPU and 2 Cores but we changed it to 8 CPU's and 1 core. The Web and Core where at 8 CPU and 2 Cores,we changed to 12 CPUs and 1 Core. I noticed that the processor stayed well below 70 percent and overall system performance is better.
Because vCPUs are required to be scheduled on a virtual machine's home node,
there exists a limitation on how
many vCPUs can be associated with a NUMA home. For best performance, the
number of physical processors (or
cores) per NUMA node becomes the maximum vCPU count a virtual machine can
have to be managed by the
NUMA scheduler. For example, on a 4-core (and 8 logical processors with HT)
per NUMA node system, only up to
4-vCPU virtual machines can have a NUMA home.