Running Xen efficiently

Last modified: 2019-07-05

Summary

The speed of Xen guests varies tremendously depending on how they are run and on how the control domain Dom0 is run. In almost all scenarios, PV mode should be avoided. PVH is the best option, but HVM is also good.

There are only two exceptions: (1) If using really old hardware (from 2010 or older) which lack Nested Page Tables (known as SLAT, EPT, NPT, or RVI) or (2) If running guests without PV capability under HVM ("PVHVM") where the guests do a lot of I/O.

When well-configured, the system performance may double compared to a basic, naive configuration.

PV is great, or is it?

The PV mode seems great; here we have a special, fast interface between the virtualiser and the guest. That ought to be great!

No, it is not. Don't use it. It only makes sense on since long obsolete hardware. Modern hardware provides a feature known as Nested Page Tables which makes memory operations under non-PV mode much more efficient than under PV mode. Since memory operations are pretty damn important, any saving from PV are canceled out, except for certain loads.

Unfortunately, avoiding PV altogether is not yet possible (but might be with Xen 4.13). The one exception is Dom0. Did you think Dom0 is faster that any "real" guests? It is not, as a matter of fact it is typically slower.

OK, so there is one scenario where PV might be useful even on non-archaic hardware, and it is if your guests do a lot of I/O and also do not support PVHVM. Of the OSes I know of, that is only NetBSD while GNU/Linux, FreeBSD, OpenBSD all support PVHVM. I believe Windows with separate drivers also does PVHVM, but I haven't tried that.

If you think your task needs a lot of I/O and thus should use PV, you're probably wrong. My main task is math library compilation, which would seem like fairly I/O intensive. But over a heterogeneous range of systems, 15 out of 18 run between 30% and 50% slower under PV. The 3 remaining are historic artifacts which lack NPT.

More VCPUs are faster. Or not.

If you don't really need multiple CPUs for a guest, assign just one. The overhead of running more than one VCPU is outrageous. It is better to run n Xen guests than running one guest with n VCPUs. You read that right.

What about Dom0?

We don't have the option of using PVH or HVM here, although PVH support is being added. For now, we need to deal with the PV slowness in Dom0. But we do not need to suffer from the more-than-one-CPU slowness. Dom0 should have just one VCPU! In particular pure HVM (as opposed to PVHVM) guests's I/O will suffer if the Dom0 has more than one VCPU! How does that work? Well, such guests depend on qemu processes in the Dom0 for I/O, and a slow Dom0 means slow HVM guest I/O.

There are some drawbacks with providing just one VCPU for Dom0. One drawback is that booting becomes slow when there are many guests booting at the same time, as booting is I/O intensive. Another drawback is that the total I/O of all guests is limited to what a single VCPU can provide.