Virtualization: The elegant way and the x86 way

Virtualization means running one or more complete operating systems (at the same time) on one machine, possibly on top of another operating system. VMware, VirtualPC, Parallels etc. support, for example, running a complete GNU/Linux OS on top of Windows. For virtualization, the Virtual Machine Monitor (VMM) must be more powerful than kernel mode code of the guest: The guest’s kernel mode code must not be allowed to change the global state of the machine, but may not notice that its attempts fail, as it was designed for kernel mode. The VMM as the arbiter must be able to control the guest completely.

Architectures like the PowerPC made virtualization easy from the beginning. There are no assembly instructions that work differently in kernel mode than in user mode. An instruction either works the same in both modes, or it throws an exception when used in user mode. In order to virtualize an operating system, it is as easy as running the kernel mode part of the guest in user mode and emulate all instructions that throw exceptions. When the guest OS wants to set up a page table, the VMM notices this, intercepts the instruction, and changes its own page tables, so that the guest OS works as it is supposed to, but the VMM and other guests cannot be affected.

On the x86 platform, there are several instructions that just behave differently in kernel mode and in user mode. If we run kernel mode code in user mode, some sensitive instructions might not throw exceptions, but instead return incorrect (compared to kernel mode) results. VMware, VirtualPC, Parallels and friends therefore have to scan all kernel mode code and replace these sensitive instructions with explicit calls to the VMM. This effectively steals about 100 MHz of computing power per VM running.

Intel fixed it with its “Virtualization Technology” (VT), formerly known as “Vanderpool”, but not by adding a global switch that makes all sensitive instructions throw exceptions in user mode – but by adding yet another mode of execution. The new “root mode” is more powerful than standard kernel mode. The host OS and the VMM run in root mode, and the VMM switches to “non-root” mode into the guest OS, after telling the CPU which instructions and events should make it leave non-root mode and return to the VMM. This sounds complex – but it therefore fits nicely into the x86 architecture. 😉

Although AMD’s Pacifica is incompatible, it’s the same design. But it’s more powerful: Pacifica allows 16 Bit as well as non-paged applications in non-root mode, whereas VT restricts the VM to 32/64 bit paged mode.

I know that I simplified the whole issue a lot, but if you have corrections or any other comments, please do add them.

9 thoughts on “Virtualization: The elegant way and the x86 way

  1. Bert JW Regeer

    While I may be just a simple programmer making use of the higher level stuff like C++/C I do find posts like these very interesting as they make sense to me, and at the same time provide me with something cool to remember for when it is needed.

    I would just like to thank you for the effort you two put into this website, and all the interesting puzzles, trivia and random bits of information that come by. Being 18 and still in High School means I still have a long way to go, and the earlier on that I learn, the better I retain the information.

    Once again, thank you,
    Bert JW Regeer

    Reply
  2. myria

    There’s no instruction on PowerPC to query whether you are user mode or kernel mode? That’s perhaps the biggest problem with virtualization. The program can tell it’s in user mode. On x86, it’s as simple as:

    mov ax, cs
    test al, 3
    jz kernel

    There’s nothing like this for PowerPC?

    myria

    Reply
  3. Felix

    Hm, shouldn’t mfmsr and testing bits like HV, PR, IR, DR yield similar results?

    However i don’t know if one is able to trap mfmsr. I don’t believe so, but i could be wrong.

    Anyway, the question is less about detecting usermode vs. supervisor (privileged) mode – it’s more about detecting wether a hypervisor is active or not, without using hypervisor-only commands. I believe you can emulate a complete supervisor environment trough the hypervisor (including DR and IR disabled? i don’t know). But the HV bit would be still missing…

    Reply
  4. seppel

    @myria: No, mfmsr is a privileged instruction.

    It is as easy as this: Make all intructions privileged, which access sensitive information. Of course, this was flawed on the x86 just from the beginning. The flag register mixes sensitive information with userspace information.

    Reply
  5. kmag

    @myra that’s the whole point of virtualization. If the supervisor code (ring 0 on x86) needs to know about the hypervisor, it’s called paravirtualization. If the Popek and Goldberg criteria are met, it’s called virtualization. (If it’s designed to meet the Popek and Goldberg criteria, but doesn’t… that’s called a software bug.)

    @myra : why is vurtualization a problem? When do you need to know if you’re in a virtualized environment? If there’s no way to tell, that means your code will run _identically_. Of course, there’s usually a bug or two in the real world.

    Virtualization is very powerful. That’s why the Motorolla people changed the m68k serries to fit the Popek and Goldberg criteria between two revisions. (I think it was between the 68000 and the 68010, but don’t quote me on that.) There was one instruction that returned some machine status register that would allow code to figure out that it wasn’t really in a privileged mode. The change was to make that instruction trap if executed in an unpriveleged mode, and to add a new instruction that returned the same info, but always set or unset the bits that would have leaked info. This second instruction didn’t trap, and was therefore better for performance in a virtualized environment.

    At my last job, machines were allowed to be on the extenal network or on the internal network, but not both. Our nice shiny blade server had 2 ethernet ports… so we used virtualization software to run two copies of Windows (Bloomberg only runs on Windows) on the blade… each using only one ethernet port. It would have been pretty horrible if some driver writer decided it was a good idea to check if the code was really in ring-0 and then freak out, right in the middle of a multi-million dollar stock trade. You don’t want code to be able to check if it’s running in a virtualized environment.

    Reply
  6. Pingback: Evan Teran’s Blog » Blog Archive » Why do AMD and Intel insist on making virtualization complex?

  7. Aniruddha

    @Evan Teran: They created those virtualization layer to avoid the instruction screening overhead.
    Now VMM doesn’t need to monitor all the instruction , CPU itself will notify VMM for any restricted/significant instructions.

    Reply
    1. http://www./

      The other day I saw a pickup truck wrapped in woodland camo. Never seen one in person– very cool! But then I noticed it was a Lincoln pickup and couldn’t imagine it in the woods let alone off-road mudding!!! Kinda made me scratch my head! Still kinda cool, though.

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *