Me being the ex-netapp employee I will try to explain this a little bit for you:
So you are right in the datastore latency needing to be below 20 ms. There are different latency situations however. You have latency from your Storage Controller/Datastore to VMware if you are using NFS (Like us) then that is one piece. Once you have the datastore in vmware there is also vm disk latency.
Most of the time if you are seeing vm disk latency it typically is the following:
- Resource contention on the disks, meaning you are doing more IOP's than the disks can handle therefore causing latency to writing or reading from disks.
- Sometimes you can have a vm running out of control doing more IO than it should and this can cause latency for the other vm's trying to access the disks, (I had this issue with our vcenter vm a while back)
So I would investigate the IO per datastore and compare that to how many disks in your aggregate and make sure you are not doing more than your disks can handle, (If you are a netapp shop I can help with this). I would also check the individual vm's and see what their IO is and make sure you don't have a bully vm on a datastore causing issues.
If you are a netapp shop other things that can cause issues are long running dedupe jobs during the day, misaligned vm's,hot disks, etc.
Hope this helps and let me know if you need more explanation.