Monthly Archives: May 2006

Microsoft changes CS value in Win64

I just found out the hard way that in 32 bit programs under Win64, the value of CS changed. In Win32, the value of CS is 0x001B. In 32 bit programs under Win64, it’s 0x0023. This will probably break some programs, especially debuggers.

Why did Microsoft do this? It’s not like the value of CS is undocumented: it’s in the DDK as KGDT_R3_CODE, and I’ve seen it several times in other places on MSDN. I can’t see any reason that they changed it. The 64 bit CS didn’t replace it – the 64 bit CS is 0x0033.

Normally I wouldn’t post 2 things in 2 days but this just really annoys me.


Simple compiler optimization

I thought of an optimization that compilers for most CPUs could do that I think should be implemented. Let’s say you have C code like this:

if ((variable == 4) || (variable == 6)) { stuff }

Any existing compiler I know of would compile into something like this (eax is variable):

cmp eax, 4
jz label
cmp eax, 6
jz label

That’s a bad way to do it. It should be implemented more like this:

or eax, 2
cmp eax, 6
jz label

This saves one instruction, but more importantly, it saves a conditional branch. Conditional branches can be expensive on modern hardware. This will work on any similar “if” statement whose constants differ by a single bit.

Even if you must copy eax to another register first, it removes a conditional branch, which is probably a win. The situations in which to do this are more complicated and must take into account many things.

For constants that are 1 apart but not 1 bit apart, like 7 and 8, you can do:

sub eax, 7 // use dec eax in 1/2 case
test eax, 0xFFFFFFFE // this is a 3 byte opcode
jz label

This also works for 0 and -1: increment eax first.


Asking the kernel how to make a syscall

Imagine you’re an i386 user mode application on a modern operating system, and you want to make a syscall, for example to request some memory or create a new thread. But syscalls can be made in various ways on the i386 family of CPUs (int, call gates, sysenter, syscall), and CPUs tend to support only a subset of them. But hardcoding “int” into the kernel is a waste of resources on modern CPUs, because sysenter is a lot faster.

The Windows XP kernel for example therefore detects the CPU type and tells user mode applications what mechanism to use. It maps at a constant location in every address space a read-only page that contains a small stub that can be called from user mode like a library function, and that does nothing more than transparently make the syscall.


So far so good. But what if, for compatibility reasons, your cannot just map this page at a constant location? A microkernel like L4 is, among other things (to make a long story short), designed to support running unmodified applications written for many different operating systems at the same time, so we cannot guarantee for any location in the 4 GB address space that we can safely map a page of code there without destroying compatibility with some operating system.

So the question is, how can we ask the kernel how to make syscalls, if the kernel cannot put the info in our memory, and we obviously cannot make a syscall to ask the kernel…

The idea is to trap into kernel mode, by doing something illegal, so the kernel can put the information in a register and return to user mode. A division by zero is such a trap – but then the kernel would not be able to distinguish between this special syscall and a real division by zero exception. Using an illegal instruction doesn’t help either, because no i386 opcode is guaranteed to be illegal in the future.

The L4 guys came up with “lock nop”. “lock” is a prefix that makes sure that in the following instruction, the memory bus is not shared with any other CPU in an SMP configuration. But “lock” may only be used with one of 17 specific instructions – all other instructions following a “lock” will cause an “undefined opcode” exception and trap into the kernel, which can easily look up whether it was “lock nop” that caused the exception.

(Now here is a question: I found a hint on the net that “lock nop” didn’t do anything on some early Intel i386 CPUs – does anyone have more information?)