Simple compiler optimization

I thought of an optimization that compilers for most CPUs could do that I think should be implemented. Let’s say you have C code like this:

if ((variable == 4) || (variable == 6)) { stuff }

Any existing compiler I know of would compile into something like this (eax is variable):

cmp eax, 4
jz label
cmp eax, 6
jz label

That’s a bad way to do it. It should be implemented more like this:

or eax, 2
cmp eax, 6
jz label

This saves one instruction, but more importantly, it saves a conditional branch. Conditional branches can be expensive on modern hardware. This will work on any similar “if” statement whose constants differ by a single bit.

Even if you must copy eax to another register first, it removes a conditional branch, which is probably a win. The situations in which to do this are more complicated and must take into account many things.

For constants that are 1 apart but not 1 bit apart, like 7 and 8, you can do:

sub eax, 7 // use dec eax in 1/2 case
test eax, 0xFFFFFFFE // this is a 3 byte opcode
jz label

This also works for 0 and -1: increment eax first.

myria

22 thoughts on “Simple compiler optimization”

  1. This will work not only if the two operands differ by a single bit, but in general if one of them can be obtained from the other via an OR-operation. So you could compile

    if ((variable == 4) || (variable == 14))

    to

    or eax, 10
    cmp eax, 14
    jz label

    and so on.

    Now, can you do a similar optimization for three (or n) constants?

  2. That doesn’t work because 12 and 6 would also pass your check. Your code effectively means:

    if ((variable == 4) || (variable == 6) || (variable == 12) || (variable == 14))

    3? don’t know =)

  3. > This also works for 0 and -1: increment eax first.

    I thought of an improvement on this.

    if ((variable == 0) || (variable == -1))

    can be implemented as:

    add eax, 1
    jbe label

    because “jbe” really means “jump if carry or zero”.

  4. A bit late but…

    For values where the bit-representation are distinctive, say you check for 2, 4 and 8, wouldn’t (at compile-time) or’ing them together, then simply do “test eax, 14; 2|4|8” (followed by jz) do the trick too? Would be hard to beat both in speed and size, I suspect. :-)

  5. I’m not so sure this would be much of a win at the compiler level; it sounds like a developer-level optimisation to me.

    Short conditional branches are almost free these days with speculative execution, and most modern CPUs have plenty of BPT and register rename slots.

    The language also precludes doing anything much useful with this unless the operand happens to be in a register; it’s a very narrow case.

  6. Hallo,
    oh man.. So many Freaking Programmer (Jeloas). OK coud someone tell me about what the hell a u talking about? I Know C++ I know compiling.. But I dont get this shit!

    THIS:
    “or eax, 2
    cmp eax, 6
    jz label”

    is better then this:
    “cmp eax, 4
    jz label
    cmp eax, 6
    jz label”

    Ok i got it why (coz its then lower memory). But what is this???

  7. tamlin: I don’t think so. If you do “test eax, 14”, it should also accept 6 (2|4), 10 (2|8), 12 (4|8) & 14( 2|4|8).
    However, your idea would work, if the values IN eax always got different bit-representations, instead of only the values you test eax AGAINST (like 2,4,8).

  8. > > This also works for 0 and -1: increment eax first.
    >
    > I thought of an improvement on this.
    >
    > if ((variable == 0) || (variable == -1))
    >
    > can be implemented as:
    >
    > add eax, 1
    > jbe label
    >
    > because “jbe” really means “jump if carry or zero”.

    In this case

    inc eax
    jbe label

    bye

  9. > That doesn’t work because 12 and 6 would also pass your check. Your
    > code effectively means:
    >
    > if ((variable == 4) || (variable == 6) || (variable == 12) || (variable == 14))
    >
    > 3? don’t know =)

    0 = 0000 & 0001 = 0000 -> wow – it’s dont work =
    4 = 0100 & 0001 = 0000
    6 = 0110 & 0001 = 0000
    12 = 1100 & 0001 = 0000
    14 = 1110 & 0001 = 0000

    jz out ; zero dont work :)
    and eax, 1
    jz label
    :out

    it’s correct ? =)

  10. I’m back =)

    > if ((variable == 4) || (variable == 6) || (variable == 12) || (variable == 14))

    i make:

    and eax, 5

    int main() {
    int i;

    for (i = 0; i %sn”, i, ((i & 0x05) == 4 ? “yes” : “no”));

    getchar();
    }

    now is ok =P
    bye

  11. if ((variable == 4) || (variable == 6) || (variable == 12) || (variable == 14))

    in asm code:

    and eax, 5 ; reset bits 2 and 4
    cmp eax, 4 ; verify if bit 3 is set
    je label

  12. This only seems to work if OR flips on a single bit to make up the difference, which is why some of the comments about 14 fail: 4 = 0100, 14 = 1110; OR eax with (10 = 1010) would give 1110 = 14, but (6 = 0110) and (12 = 1100) would yield the same result; thus, the optimization causes behavior contrary to that which was specified. Let’s look at some possibilities (the OR required to get there will be omitted):

    000 || 001 = works
    000 || 010 = works
    000 || 011 = too many matches (TMM), would match 0,1,2,3
    000 || 100 = works
    000 || 101 = TMM
    000 || 110 = TMM
    000 || 111 = TMM
    001 || 010 = no common comparison (NCC)
    001 || 011 = works
    001 || 100 = NCC
    001 || 101 = works
    001 || 110 = NCC
    001 || 111 = TMM
    010 || 011 = works
    010 || 100 = NCC
    010 || 101 = NCC
    010 || 110 = works
    010 || 111 = TMM
    ad nauseum…

    This optimization only works with single bit flips being used to add up from the smaller number to equal the larger one; anything more causes extra matches or necessitates two comparisons anyway. Additionally, this optimization will clobber the register’s contents, so it only works if the register’s contents are discarded.

  13. See https://c.godbolt.org/z/zeNqkt

    A seemingly nice, no-branch formula is (((2 – x) | 2) == -2), but 12 years after the original posting (!), clang effortlessly beats it. Still, slightly disappointing that clang didn’t optimize the “naive” version. One up for the humans!

    These are functions returning 0 or 1, so are slightly longer than when inlined either by hand or by the compiler.

    Can you do better than the following?

    int Is4Or6_1(int x)
    {
    // naive way
    if (x == 4)
    return 1;

    if (x == 6)
    return 1;

    return 0;
    }

    int Is4Or6_2(int x)
    {
    // better
    if ((x == 4) || (x == 6))
    return 1;
    return 0;
    }

    int Is4Or6_3(int x)
    {
    // worst of all !
    return (((2 – x) | 2) == -2);
    }

  14. This might be overly picky – or i have missed something, but shouldn’t

    > if ((variable == 4) || (variable == 6)) { stuff }

    be turned into

    > cmp eax, 4
    > je stuff
    > cmp eax, 6
    > jne after_stuff

    Also, while the the solution seams fine for the test itself, it does have the side effect of vandalizing AX holding the value of variable. so if there is anything to be done with this value within ‘stuff’ it needs to be reloaded.

Leave a Reply to myria Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.