AMD64 is a quite clean extension of the i386 instruction set, it obsoletes many rarely used features of the i386 and introduces new registers, making the instruction set a lot more… logical.
But there is one feature, actually a nice trick of the 8086/8088, which would make problems with the AMD64 extensions:
The NOP instruction would now clear the upper 32 bits of EAX in 64 bit mode.
How can this be? Just like many RISC CPUs, the 8086 did not have a real NOP instruction. An assembler would translate the mnemonic “nop” into “xchg ax, ax” (opcode 0x90)- this instruction has no effect, and does not even touch the flags.
The AMD64 was designed for the LP64 model, as opposed to ILP64, i.e. in C, long and pointers are 64 bit, but integers are still 32 bit. So the AMD64 is optimized for 32 bit arithmetic: When working with 32 bit values in the now 64 bit wide registers, the upper 32 bits of the result are always cleared. So an “add eax, ebx” would add the lower 32 bits of (the 64 bit registers) RAX and RBX, and clear the upper 32 bits of the result.
So what does “xchg eax, eax” do? It does nothing… and then clears the upper 32 bits of EAX.
AMD decided that the opcode 0x90 should remain a NOP even in 64 bit mode of the AMD64 instruction set, so 0x90 is now an explicit NOP. If you really need to do “xchg eax, eax” and thus clear the upper 32 bits, you can use the two-byte encoding 0x87h, 0xC0 – and this is what an assembler will generate.
Actually 0x90 has been a NOP since the i486. While “xchg reg, reg” usually took 3 cycles, “xchg eax, eax”, i.e. “nop” only took one. All later CPUs also recognized 0x90 as a NOP, because otherwise there would have been a nasty (read and write) dependency on EAX, which could stall the pipeline significantly – a NOP shouldn’t really wait for EAX being written by previous instructions, and following instructions reading EAX shouldn’t have to wait for a NOP…