VDI Memory Overcommitment

Memory overcommitment remains a contentious topic, despite some really great studies on the topic.  Regarding VDI, I’ve heard opinions ranging from “0 memory overcommitment!” to “200%, 300%, it’s fine!”  I figured that I’d share my thoughts on the topic and see if anyone else wants to weigh in on the discussion.  First though, some points about the decision.

The main argument that I’ve heard against memory overcommitment boils down to protecting the user experience.  Since virtual desktops have users actively logged into and using them, even slight performance degradation is immediately noticed.  We all know that hypervisor swapping is Very Bad for performance.  If you don’t overcommit your hosts’ memory, you won’t have swapping, so why overcommit?

The simple answer is that, while 2 GB of RAM for a single desktop is no big deal, 2 GB of dedicated RAM for 1000 desktops is a lot harder to accept.  Memory overcommitment can be an important technique for bringing down the cost of a VDI solution.  Since ESXi servers benefit from Transparent Page Sharing (think dedupe for memory), we have a certain amount of leeway for memory overcommitment.  In a homogenous environment (the same OS and similar applications), Transparent Page Sharing can recover very appreciable amounts of memory… and what could be more homogenous than a giant collection of virtual desktops that are all spun off of the same image?

Even in lean shops (I’m talking about 1 GB of RAM assigned for Windows 7 VMs), I frequently see VDI ESXi hosts with an Active:Allocated memory ratio of 1:4.  I wouldn’t go so far as to say that that means that you can allocate 400% of your hosts’ memory… but it certainly does imply that some overcommitment may be safe.  But how much?  Well, that’s where it gets a bit fuzzy.

The best answer that I can think of is that classic cop-out: it depends.  It seems to me that 125% to 150% allocation is pretty safe for VDI, with far more aggressive numbers being possible.  Fortunately, VDI environments tend to grow in steady groups and can be highly monitored.  This means that the best approach that I can think of is to start conservatively, then increase density as the hard metrics support it.  That said, as a ballpark number, I think that the 125% to 150% range is a fairly realistic place to run a production environment – what do you think?


Popular posts from this blog

Deleting Orphaned (AKA Zombie) VMDK Files

Clone a Standard vSwitch from one ESXi Host to Another

Orphaned VMDK Files