Imagine you’re an i386 user mode application on a modern operating system, and you want to make a syscall, for example to request some memory or create a new thread. But syscalls can be made in various ways on the i386 family of CPUs (int, call gates, sysenter, syscall), and CPUs tend to support only a subset of them. But hardcoding “int” into the kernel is a waste of resources on modern CPUs, because sysenter is a lot faster.
The Windows XP kernel for example therefore detects the CPU type and tells user mode applications what mechanism to use. It maps at a constant location in every address space a read-only page that contains a small stub that can be called from user mode like a library function, and that does nothing more than transparently make the syscall.
do_syscall:
sysenter
ret
So far so good. But what if, for compatibility reasons, your cannot just map this page at a constant location? A microkernel like L4 is, among other things (to make a long story short), designed to support running unmodified applications written for many different operating systems at the same time, so we cannot guarantee for any location in the 4 GB address space that we can safely map a page of code there without destroying compatibility with some operating system.
So the question is, how can we ask the kernel how to make syscalls, if the kernel cannot put the info in our memory, and we obviously cannot make a syscall to ask the kernel…
The idea is to trap into kernel mode, by doing something illegal, so the kernel can put the information in a register and return to user mode. A division by zero is such a trap – but then the kernel would not be able to distinguish between this special syscall and a real division by zero exception. Using an illegal instruction doesn’t help either, because no i386 opcode is guaranteed to be illegal in the future.
The L4 guys came up with “lock nop”. “lock” is a prefix that makes sure that in the following instruction, the memory bus is not shared with any other CPU in an SMP configuration. But “lock” may only be used with one of 17 specific instructions – all other instructions following a “lock” will cause an “undefined opcode” exception and trap into the kernel, which can easily look up whether it was “lock nop” that caused the exception.
(Now here is a question: I found a hint on the net that “lock nop” didn’t do anything on some early Intel i386 CPUs – does anyone have more information?)