Category Archives: puzzle

Win32's MulDiv

In Win32, there is an API call called “MulDiv”:

The MulDiv function multiplies two 32-bit values and then divides the 64-bit result by a third 32-bit value. The return value is rounded up or down to the nearest integer.

int WINAPI MulDiv(
int nNumber,
int nNumerator,
int nDenominator

If a divide overflow (including divide by zero) occurs, MulDiv returns -1. (Stupidly, there’s no way to actually check whether the result truly is -1 instead of an error.)

How do we implement this in x86-32? The official version does not use structured exception handling to simply catch the exception and handle it. MulDiv actually checks for overflow beforehand and never causes an exception.

The official implementation is several pages long, and I think we could do much better.

If this were the unsigned case, the entire function would be simple:

mov eax, [esp+4]
mul dword ptr [esp+8]
cmp edx, [esp+12]
jae overflow
div dword ptr [esp+12]
ret 12
or eax, -1
ret 12

The C ! operator

In C, the ! (“logical NOT”) operator used on a value x evaluates to 0 when x is not 0, and 1 when x is 0. In other words, it’s equivalent to the following C:

(x == 0) ? 1 : 0

How should this be implemented in x86 assembly language, when “x” is already in a register? The target register can either be the same one, or it can be a different one. I didn’t try too hard and got 7 bytes; it can probably be made better. On other CPUs, it can be done in a single instruction. For example, in MIPS: “sltiu dest, src, 1”.

Note that this is about the case where the compiler doesn’t know how the result of the ! is used, as in “return !x;” in a non-inlined function. Cases like “if (!x)” are simpler.

(If you want to share how easily it can be done on *your* favorite CPU, please post a comment as well!)

Why does PUSHA also push the stack pointer?

This puzzle is actually a quite easy one – but when I asked it in a university course, it kept some people busy for some time to find out the answer, so I thought it might be a good idea to ask you nevertheless:

The 8086 “PUSHA” instruction saves the following registers on the stack:
In i386+ 32 bit mode, these registers are EAX etc., and in x86_64 64 bit mode, it is RAX etc., but all of them always also save the stack pointer.

Why do x86 CPUs do this? Isn’t it true that it makes no sense to save the stack pointer, as the stack pointer is needed to retrieve these values anyway?

In a few days, I’ll post some trivia on this topic…

Puzzle: PowerPC Flag Simulation on x86

This week’s puzzle is to copy the carry flag to the high bit of ah.  You may destroy any other register, the flags, and the other 24 bits of eax.  Shortest sequence wins.

For an optional very long description of what this is all about, click “Read the rest”.  You don’t need to read it to participate in the contest =)

Continue reading

First assembly puzzle!

This is our first assembly language puzzle for the new site! These puzzles are tests to see whether you are good enough of an assembly nerd, and to learn some tricks if you’re not =^_^=

Our first puzzle is of a classic type: size optimization.  It is for x86-32 assembly language, certainly the most widely known assembly language.  We will definitely do other puzzles for other processors though!

You might find that many of these puzzles are good ideas to place into compilers and other automated assembly/machine code generators.

The puzzle: In terms of opcode bytes, find the smallest sequence of x86-32 instructions to implement the following C/C++ code:

if ((x == 0) || (y == 0))
    goto label;


  • x and y are 32-bit integers or pointers.
  • x and y are each already in general-purpose registers or memory locations of your choice.
  • Do not assume a particular state of the flags, except that you may assume the direction flag is always clear as that is its usual state.
  • You may destroy any general-purpose registers or memory locations as you see fit, including the locations of x and y.
  • Assume that label is within range of a short jump.
  • Do not assume that you have access to protected instructions.
  • In general, answers that are the same size but faster or less destructive are considered better than others.

I was rather verbose in the rules because it’s the first puzzle.  Future puzzles won’t necessarily mention these restrictions.

Answers that don’t fit all the rules but have other merits like creativity are certainly welcomed!

The smallest answer I could find was 6 bytes.  The straightforward answer is 8 bytes.  Good luck!


(check comments for solution(s))