# Simple compiler optimization

I thought of an optimization that compilers for most CPUs could do that I think should be implemented. Let’s say you have C code like this:

if ((variable == 4) || (variable == 6)) { stuff }

Any existing compiler I know of would compile into something like this (eax is variable):

cmp eax, 4
jz label
cmp eax, 6
jz label

That’s a bad way to do it. It should be implemented more like this:

or eax, 2
cmp eax, 6
jz label

This saves one instruction, but more importantly, it saves a conditional branch. Conditional branches can be expensive on modern hardware. This will work on any similar “if” statement whose constants differ by a single bit.

Even if you must copy eax to another register first, it removes a conditional branch, which is probably a win. The situations in which to do this are more complicated and must take into account many things.

For constants that are 1 apart but not 1 bit apart, like 7 and 8, you can do:

sub eax, 7 // use dec eax in 1/2 case
test eax, 0xFFFFFFFE // this is a 3 byte opcode
jz label

This also works for 0 and -1: increment eax first.

myria

### 22 thoughts on “Simple compiler optimization”

1. This will work not only if the two operands differ by a single bit, but in general if one of them can be obtained from the other via an OR-operation. So you could compile

if ((variable == 4) || (variable == 14))

to

or eax, 10
cmp eax, 14
jz label

and so on.

Now, can you do a similar optimization for three (or n) constants?

2. That doesn’t work because 12 and 6 would also pass your check. Your code effectively means:

if ((variable == 4) || (variable == 6) || (variable == 12) || (variable == 14))

3? don’t know =)

3. okay, so I… uhm… found an optimization for four constants! yay!

4. > This also works for 0 and -1: increment eax first.

I thought of an improvement on this.

if ((variable == 0) || (variable == -1))

can be implemented as:

jbe label

because “jbe” really means “jump if carry or zero”.

5. Ack, nope, that doesn’t work >_<

6. A bit late but…

For values where the bit-representation are distinctive, say you check for 2, 4 and 8, wouldn’t (at compile-time) or’ing them together, then simply do “test eax, 14; 2|4|8” (followed by jz) do the trick too? Would be hard to beat both in speed and size, I suspect. :-)

7. I’m not so sure this would be much of a win at the compiler level; it sounds like a developer-level optimisation to me.

Short conditional branches are almost free these days with speculative execution, and most modern CPUs have plenty of BPT and register rename slots.

The language also precludes doing anything much useful with this unless the operand happens to be in a register; it’s a very narrow case.

8. Hallo,
oh man.. So many Freaking Programmer (Jeloas). OK coud someone tell me about what the hell a u talking about? I Know C++ I know compiling.. But I dont get this shit!

THIS:
“or eax, 2
cmp eax, 6
jz label”

is better then this:
“cmp eax, 4
jz label
cmp eax, 6
jz label”

Ok i got it why (coz its then lower memory). But what is this???

9. tamlin: I don’t think so. If you do “test eax, 14”, it should also accept 6 (2|4), 10 (2|8), 12 (4|8) & 14( 2|4|8).
However, your idea would work, if the values IN eax always got different bit-representations, instead of only the values you test eax AGAINST (like 2,4,8).

10. > > This also works for 0 and -1: increment eax first.
>
> I thought of an improvement on this.
>
> if ((variable == 0) || (variable == -1))
>
> can be implemented as:
>
> jbe label
>
> because âjbeâ really means âjump if carry or zeroâ.

In this case

inc eax
jbe label

bye

11. > That doesnât work because 12 and 6 would also pass your check. Your
> code effectively means:
>
> if ((variable == 4) || (variable == 6) || (variable == 12) || (variable == 14))
>
> 3? donât know =)

0 = 0000 & 0001 = 0000 -> wow – it’s dont work =
4 = 0100 & 0001 = 0000
6 = 0110 & 0001 = 0000
12 = 1100 & 0001 = 0000
14 = 1110 & 0001 = 0000

jz out ; zero dont work :)
and eax, 1
jz label
:out

it’s correct ? =)

12. Owww…. 2 and 10 dont work =( sorry!!

13. its works with 0 2 4 6 8 10 12 14 =\\ waitt….. burning my soul :D

14. I’m back =)

> if ((variable == 4) || (variable == 6) || (variable == 12) || (variable == 14))

i make:

and eax, 5

int main() {
int i;

for (i = 0; i %sn”, i, ((i & 0x05) == 4 ? “yes” : “no”));

getchar();
}

now is ok =P
bye

15. My C code out incorrect in last post…

for (i = 0; i %sn”, i, ((i & 0x05) == 4 ? “yes” : “no”));

16. oww….. its sux =

17. diferent..

for (i = 0; i != 0x10; i++)
printf (“%d -> %sn”, i, ((i & 0x05) == 4 ? “yes” : “no”));

18. I give up =((((

19. if ((variable == 4) || (variable == 6) || (variable == 12) || (variable == 14))

in asm code:

and eax, 5 ; reset bits 2 and 4
cmp eax, 4 ; verify if bit 3 is set
je label

20. This only seems to work if OR flips on a single bit to make up the difference, which is why some of the comments about 14 fail: 4 = 0100, 14 = 1110; OR eax with (10 = 1010) would give 1110 = 14, but (6 = 0110) and (12 = 1100) would yield the same result; thus, the optimization causes behavior contrary to that which was specified. Let’s look at some possibilities (the OR required to get there will be omitted):

000 || 001 = works
000 || 010 = works
000 || 011 = too many matches (TMM), would match 0,1,2,3
000 || 100 = works
000 || 101 = TMM
000 || 110 = TMM
000 || 111 = TMM
001 || 010 = no common comparison (NCC)
001 || 011 = works
001 || 100 = NCC
001 || 101 = works
001 || 110 = NCC
001 || 111 = TMM
010 || 011 = works
010 || 100 = NCC
010 || 101 = NCC
010 || 110 = works
010 || 111 = TMM

This optimization only works with single bit flips being used to add up from the smaller number to equal the larger one; anything more causes extra matches or necessitates two comparisons anyway. Additionally, this optimization will clobber the register’s contents, so it only works if the register’s contents are discarded.

21. A seemingly nice, no-branch formula is (((2 – x) | 2) == -2), but 12 years after the original posting (!), clang effortlessly beats it. Still, slightly disappointing that clang didn’t optimize the “naive” version. One up for the humans!

These are functions returning 0 or 1, so are slightly longer than when inlined either by hand or by the compiler.

Can you do better than the following?

int Is4Or6_1(int x)
{
// naive way
if (x == 4)
return 1;

if (x == 6)
return 1;

return 0;
}

int Is4Or6_2(int x)
{
// better
if ((x == 4) || (x == 6))
return 1;
return 0;
}

int Is4Or6_3(int x)
{
// worst of all !
return (((2 – x) | 2) == -2);
}

22. This might be overly picky – or i have missed something, but shouldn’t

> if ((variable == 4) || (variable == 6)) { stuff }

be turned into

> cmp eax, 4
> je stuff
> cmp eax, 6
> jne after_stuff

Also, while the the solution seams fine for the test itself, it does have the side effect of vandalizing AX holding the value of variable. so if there is anything to be done with this value within ‘stuff’ it needs to be reloaded.