Monthly Archives: March 2006

How NOP nearly became a non-NOP on AMD64

AMD64 is a quite clean extension of the i386 instruction set, it obsoletes many rarely used features of the i386 and introduces new registers, making the instruction set a lot more… logical.

But there is one feature, actually a nice trick of the 8086/8088, which would make problems with the AMD64 extensions:

The NOP instruction would now clear the upper 32 bits of EAX in 64 bit mode.

How can this be? Just like many RISC CPUs, the 8086 did not have a real NOP instruction. An assembler would translate the mnemonic “nop” into “xchg ax, ax” (opcode 0x90)- this instruction has no effect, and does not even touch the flags.

The AMD64 was designed for the LP64 model, as opposed to ILP64, i.e. in C, long and pointers are 64 bit, but integers are still 32 bit. So the AMD64 is optimized for 32 bit arithmetic: When working with 32 bit values in the now 64 bit wide registers, the upper 32 bits of the result are always cleared. So an “add eax, ebx” would add the lower 32 bits of (the 64 bit registers) RAX and RBX, and clear the upper 32 bits of the result.

So what does “xchg eax, eax” do? It does nothing… and then clears the upper 32 bits of EAX.

AMD decided that the opcode 0x90 should remain a NOP even in 64 bit mode of the AMD64 instruction set, so 0x90 is now an explicit NOP. If you really need to do “xchg eax, eax” and thus clear the upper 32 bits, you can use the two-byte encoding 0x87h, 0xC0 – and this is what an assembler will generate.

Actually 0x90 has been a NOP since the i486. While “xchg reg, reg” usually took 3 cycles, “xchg eax, eax”, i.e. “nop” only took one. All later CPUs also recognized 0x90 as a NOP, because otherwise there would have been a nasty (read and write) dependency on EAX, which could stall the pipeline significantly – a NOP shouldn’t really wait for EAX being written by previous instructions, and following instructions reading EAX shouldn’t have to wait for a NOP…


First assembly puzzle!

This is our first assembly language puzzle for the new site! These puzzles are tests to see whether you are good enough of an assembly nerd, and to learn some tricks if you’re not =^_^=

Our first puzzle is of a classic type: size optimization.  It is for x86-32 assembly language, certainly the most widely known assembly language.  We will definitely do other puzzles for other processors though!

You might find that many of these puzzles are good ideas to place into compilers and other automated assembly/machine code generators.

The puzzle: In terms of opcode bytes, find the smallest sequence of x86-32 instructions to implement the following C/C++ code:

if ((x == 0) || (y == 0))
    goto label;


  • x and y are 32-bit integers or pointers.
  • x and y are each already in general-purpose registers or memory locations of your choice.
  • Do not assume a particular state of the flags, except that you may assume the direction flag is always clear as that is its usual state.
  • You may destroy any general-purpose registers or memory locations as you see fit, including the locations of x and y.
  • Assume that label is within range of a short jump.
  • Do not assume that you have access to protected instructions.
  • In general, answers that are the same size but faster or less destructive are considered better than others.

I was rather verbose in the rules because it’s the first puzzle.  Future puzzles won’t necessarily mention these restrictions.

Answers that don’t fit all the rules but have other merits like creativity are certainly welcomed!

The smallest answer I could find was 6 bytes.  The straightforward answer is 8 bytes.  Good luck!


(check comments for solution(s))