First assembly puzzle!

This is our first assembly language puzzle for the new site! These puzzles are tests to see whether you are good enough of an assembly nerd, and to learn some tricks if you’re not =^_^=

Our first puzzle is of a classic type: size optimization.  It is for x86-32 assembly language, certainly the most widely known assembly language.  We will definitely do other puzzles for other processors though!

You might find that many of these puzzles are good ideas to place into compilers and other automated assembly/machine code generators.

The puzzle: In terms of opcode bytes, find the smallest sequence of x86-32 instructions to implement the following C/C++ code:

if ((x == 0) || (y == 0))
    goto label;

Rules:

  • x and y are 32-bit integers or pointers.
  • x and y are each already in general-purpose registers or memory locations of your choice.
  • Do not assume a particular state of the flags, except that you may assume the direction flag is always clear as that is its usual state.
  • You may destroy any general-purpose registers or memory locations as you see fit, including the locations of x and y.
  • Assume that label is within range of a short jump.
  • Do not assume that you have access to protected instructions.
  • In general, answers that are the same size but faster or less destructive are considered better than others.

I was rather verbose in the rules because it’s the first puzzle.  Future puzzles won’t necessarily mention these restrictions.

Answers that don’t fit all the rules but have other merits like creativity are certainly welcomed!

The smallest answer I could find was 6 bytes.  The straightforward answer is 8 bytes.  Good luck!

-Myria

(check comments for solution(s))

23 thoughts on “First assembly puzzle!

  1. Pingback: dowhatimean.net

  2. Matt Parks

    Oops, I read it backwards…now I feel dumb. That code jumps if either is nonzero. If the two are in ecx and eax, and then

    jecxz label
    or eax,eax
    jz label

    So yeah, 6 bytes.

    Reply
  3. myria

    (I’m the post author. We haven’t gotten the site to show who posted what yet…)

    dowhatimean.net: You have what my 6 byte solution was. There is a problem with your code that results in 6 bytes rather than 4. “mul” sets the zero flag based on the low half of the result, not the whole double-sized result. Thus 0×80000000 * 0×80000000 would break your code. This is required in order to get it correct (my original solution):

    mul edx
    or eax, edx
    jz label

    Matt Parks: Omg, I totally forgot about jecxz! On older systems, your code is certainly much faster than mine. I don’t know about the P4 and A64 though, since they have really fast multiplies. It all depends on whether the mul is more expensive than the CISC-like (and therefore microcoded) jecxz and its accompanying branch misprediction penalty.

    -Melissa

    Reply
  4. Stefan Esser

    Forgive me my ignorance Melissa…

    My answer would have been

    or eax, edx
    jz label

    I see no need for the mul ;)

    -stefan

    Reply
  5. Stefan Esser

    Maybe something like

    Input: eax, ecx

    dec eax
    jo label
    jecxz label

    ??? (I don’t remember at the moment if underflow also triggers o flag)

    Reply
  6. Matt Parks

    Stefan – I tried that code snippet out on my trusty DOS debug, and the jo didn’t branch like it should…guess the overflow wasn’t set. Good thought though. Any 5-byte answer has to look something like that though…what other one-byte instructions modify the flags?

    Myria – keep the puzzles coming! This is fun.

    Reply
  7. myria

    Unfortunately, “inc” and “dec” do not modify the carry flag, because the 8080 (and its clone, Z80) did not. “inc” and “dec” do modify the overflow flag, however. The problem with overflow is that it represents a *signed* overflow/underflow (for “dec”, this means going from 80000000 to 7FFFFFFF). This makes it useless for this problem >_<

    The 5 byte solution I found isn’t as big a trick as it might seem.

    -Myria

    Reply
  8. myria

    The 5-byte solution I came up with is:

    jecxz label
    xchg eax, ecx
    jecxz label

    xchg eax, ecx is a single-byte opcode. Strangely, this optimal solution is also the least destructive: x and y were not destroyed by the operation if the condition is false, although they did switch places.

    Thanks to Matt for reminding me about jecxz =)

    imul has 3 forms, unlike mul. The first is the same format as mul: edx:eax = eax * param. The second form is a multiply of any register (not just eax) by another. However, this only stores the low 32 bits of the result. The third form is dest = src * immediate, again only storing the low 32 bits of the result.

    None of these sets the zero flag when the *whole* result is zero, only when the low 32 bits is zero. They *do* set carry when the result overflows (meaning that the operands weren’t really zero to begin with). Unfortunately, there is no “jump if zero and not carry” instruction in x86, as no normal arithmetic comparison would need one.

    -Myria

    Reply
  9. myria

    You can’t do register*register with lea. You can only do register*X where X is 1, 2, 3, 4, 5, 8 or 9. (Technically, only 1, 2, 4, 8, but you can add one register as well, plus the multiplication. So lea eax, [eax*5] is actually lea eax, [eax*4+eax].)

    -Myria

    Reply
  10. Derek

    I’m mortified that I made a stupid error like that. I haven’t programmed in asm since 1991 and I have no 32 bit debugger to test instructions so my memory is showing it’s age.
    Derek

    Reply
  11. Kenny Stauffer

    Dadhikaka,
    That only works when they are equal or both 0. What happens when the arguments have no 1 bits in common?

    Reply
  12. mirabilos

    jecxz is nice and small, but painfully slow on the K6 onward.
    They slowed down the loop instruction, because of a popular
    software *cough* using it in delay loops (not Turbo Pascal,
    the other popular one…), to 300 MHz or so, and jecxz seems
    to share the implementation or something.

    Doesn’t hinder me from using it in the bootloader though ;)

    Reply
  13. Ofarchades

    I also tried out the OR idea and got positive results, but others here seem to have had a different experience?

    My code exactly:

    mov eax, 0
    mov ebx, 0

    or eax, ebx
    jz label

    No matter what values I set EAX and EBX to, the jump only occurs if both are 0. Am I missing something?

    Reply
  14. Ofarchades

    After posting, it hit me that it’s supposed to be if one is 0, not both. Replacing the OR with an AND seems to have solved that – and it’s still 4 bytes.

    Reply
  15. void256

    I’ve discovered your blog in August 2017 and reached the end (= the beginning ;)) of the blog – this post – today, early in September 2017! Lots of super interesting information here! :)

    Thanks a lot! :)

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>