How NOP nearly became a non-NOP on AMD64

AMD64 is a quite clean extension of the i386 instruction set, it obsoletes many rarely used features of the i386 and introduces new registers, making the instruction set a lot more… logical.

But there is one feature, actually a nice trick of the 8086/8088, which would make problems with the AMD64 extensions:

The NOP instruction would now clear the upper 32 bits of EAX in 64 bit mode.

How can this be? Just like many RISC CPUs, the 8086 did not have a real NOP instruction. An assembler would translate the mnemonic “nop” into “xchg ax, ax” (opcode 0x90)- this instruction has no effect, and does not even touch the flags.

The AMD64 was designed for the LP64 model, as opposed to ILP64, i.e. in C, long and pointers are 64 bit, but integers are still 32 bit. So the AMD64 is optimized for 32 bit arithmetic: When working with 32 bit values in the now 64 bit wide registers, the upper 32 bits of the result are always cleared. So an “add eax, ebx” would add the lower 32 bits of (the 64 bit registers) RAX and RBX, and clear the upper 32 bits of the result.

So what does “xchg eax, eax” do? It does nothing… and then clears the upper 32 bits of EAX.

AMD decided that the opcode 0x90 should remain a NOP even in 64 bit mode of the AMD64 instruction set, so 0x90 is now an explicit NOP. If you really need to do “xchg eax, eax” and thus clear the upper 32 bits, you can use the two-byte encoding 0x87h, 0xC0 – and this is what an assembler will generate.

Actually 0x90 has been a NOP since the i486. While “xchg reg, reg” usually took 3 cycles, “xchg eax, eax”, i.e. “nop” only took one. All later CPUs also recognized 0x90 as a NOP, because otherwise there would have been a nasty (read and write) dependency on EAX, which could stall the pipeline significantly – a NOP shouldn’t really wait for EAX being written by previous instructions, and following instructions reading EAX shouldn’t have to wait for a NOP…

Michael

4 thoughts on “How NOP nearly became a non-NOP on AMD64”

  1. “The NOP instruction would now clear the upper 32 bits of EAX in 64 bit mode.”

    s/EAX/RAX/

    “So what does “xchg eax, eax” do? It does nothing… and then clears the upper 32 bits of EAX.”

    set noic
    s/EAX/RAX/

    Reply
  2. AMD64 was an absolutely horrible and overhyped transition compared to the 16-to-32 of the 386. The monkeys at AMD decided to wipe out a whole row of single-byte increment and decrement instructions, nearly broke the NOP due to the decision to always clear the upper bits (the only thing more stupid would be to always sign-extend) – leave those bits alone and we’d get effectively double the number of 32-bit registers like you could do with ExX and xX and a rotate by 16, and in their misguided efforts removed things like SAHF/LAHF and segmentation, only to realise that these features are very useful for things like virtualisation and reluctantly put them back later on.

    What they should’ve done is similar to how Intel handled the 16-to-32 transition: a prefix or two to access the upper register bank and make operands 64-bit, maybe even from real mode (like it was possible to use the 32-bit registers) – if they were smart they could’ve used the existing addr/oper override prefices), leave existing instructions and register contents unchanged with the exception of widening, and 64-bit address/data sizes being set in segment descriptors the same way 16/32-bit ones do – there’s enough previously reserved bits to do this (enough to make one wonder whether Intel reserved them with this purpose in mind!)

    Reply

Leave a Comment