# Arithmetic mean of two signed integers

It’s time for a puzzle again! (submitted by sheepmaster)

Calculate in x86 assembly language the arithmetic mean (a+b)/2 of two signed integers a and b. If you think this is trivial, consider the following edge cases: a = b = 0x7FFFFFFF and a = 0x7FFFFFFF, b = 0x80000000.

Again, the standard rules apply: a and b are in general purpose registers of your choice, the result can be in any register, and any register can be overwritten.

Oh, and by the way, using larger data types isn’t allowed either (that would make things too easy, wouldn’t it?).

### 43 thoughts on “Arithmetic mean of two signed integers”

1. As always I’m not fluent in 0x86 – but I speak some ARM and therefore here is a solution I thought up. First the idea:
Shift both registers right and add them. If both registers are of same sign and the zero bit was set, add/substract one. end.
a is in R0, b in R1, result will be in R2 and register R0 and R1 will be killed.
MOV R2,R0, ASR#1
AND R0,#&80000001
AND R1,#&80000001

which would have been a neat idea, but I just realized that asr doesn’t do a correct div 2 in the negative case, and so my code doesn’t work. Will think about what to do with that, and write another komment.

2. SHR EAX
SHR EBX

I know, it can be off by one, but who cares :)

–Blerik

3. So, fixing the ASR != DIV 2 problem took me another 3 instructions to give a total of nine:
MOV R2,R3, ASR#1
AND R0,#&80000001
AND R1,#&80000001

now it should work correctly, i hope ;)

4. @blerik: at least do a sar please :)

5. @Dominik: I mistranslated my Java code to assembly, probably because it has been 15 years since the last time I used assembly :).

And what is the problem with ASR? Is it different than SAR? SAR does division by two perfectly for negative (2s compement) numbers.

–Blerik

6. @Dominik: What do you get when R0 = -1 and R1 = 1? On paper I got 1, but that could just be me not being as fluent in ARM…

I had a similar idea, but didn’t need to handle negative numbers separately, as ASR always rounds down, so you have to add 1 when both numbers are odd.

What are the highest-value bits in your AND Rx,#&80000001 for?

7. @Sheepmaster:
R0 = -1 and R1 = 1
R3 = 0
R2 = 0
R3 = 1
R2 = 0
R0 = &80000001
R1 = 1
R0 = &80000001 + 1

8. @Sheepmaster:
R0 = -1 and R1 = 1
R3 = 0
R2 = 0
R3 = 1
R2 = 0
R0 = &80000001
R1 = 1
R0 = &80000001 + 1 <- here’s the error
… damn…

ASR always rounds to the lower numbers, this is why i add the sign-bit all the time (ADD rd, rs, rs LSR#31) – so the ASR rounds correctly to zero instead of lower numbers.

Oh shit – now I see the bug, the AND s are for determining if i need to adjust the resulting value. My initial solution did compare the two instead of adding them. We only need adjusting when both the sign and the lowest bit are set. But my next step is wrong. since &80000001 is nowhere near -1 (which I assumed in my disordered mind), adding doesn’t work. Arg.

9. Okay, now it got ugly. Just doing branches to determine if we have to adjust and in which direction. And the solution isn’t beautiful anymore at all :( :

— first the addition and correct ASR
MOV R2,R3, ASR#1

— check first and last bit and if they are equal
AND R0,#&80000001
AND R1,#&80000001
— if they aren’t we jump because we are finished
CMP R0,R1
BNE .end
— if the zero bit isn’t set we are finished
TST R0,#1
BEQ .end
TST R0,#&80000000
SUBEQ R2,R2,#1
.end

10. The boring x86 solution (uses CDQ, so not exactly correct wrt. “no larger data types” rule…); eax = a, ebx = b, result in eax, clobbers ebx and edx:

mov ebx, 2
cdq
idiv ebx

11. … and now I notice that it doesn’t work on the edge cases. D’oh.

12. What do you think of this (input: eax, ebx – output: eax)?

sar eax, 1
jmp done
rcr eax, 1
done:

@blerik:
To get rid of the off-by-one problem, I think you must add the value of a AND b AND 1 to your result.

13. Okay, after a quick bit of reviewing some algorithms, I’ve come up with this code (equivalent to (a & b) + ((a ^ b) >> 1) ):

mov ecx, eax
xor ecx, ebx
sar ecx, 1
and eax, ebx

This is the floor of the average. You didn’t specify how to round, so I assume this is what you wanted…

14. An old friend of mine, Risc OS ARM Assembler Guru, came up with this reduction:

MOV R2,#0 ; Startwert

MOVS R0,R0,ASR #1 ; R0=R0/2, Rest in Carry-Flag

MOVS R1,R1,ASR #1 ; R1=R1/2, Rest in Carry-Flag

MOV R2,R2,ASR #1 ; R2=R2/2 (wir haben jetzt also das Mittel aus den
; beiden Inhalten von Bit 0)

ADD R2,R2,R1 ; Einfach alles aufsummieren

Tests:

R0=&7fffffff, R1=&7fffffff -> R2=&7fffffff

R0=&80000000, R1=&80000000 -> R2=&80000000

R0=&7fffffff, R1=&80000000 -> R2=&ffffffff

Reduced to:

AND R2,R0,#1
MOVS R1,R1,ASR #1

and finally to a fantastic 4 ARM Risc Instruction Photo Finish:

AND R2,R0,R1
AND R2,R2,#1

;) – did I mention that I suck at assembler?

15. @DrV: That’s a nice solution. Mine would have been the assembly equivalent of (a >> 1) + (b >> 1) + (a & b & 1).

16. âŚwhich would look like yours in ARM assembly, Dominik :-)

17. @DrV: (a & b) + ((a ^ b) >> 1) rounds wrongly for negative numbers. e.g. -3,-4 = -4 instead of -3
@sheepmaster: sadly it wasn’t mine ;)

18. Okay, maybe I should have specified how the result should be rounded. For me, always rounding down would be acceptable, as this is what an arithmetic shift right does, and the goal is not to implement some sophisticated rounding method in assembler (although it could be an interesting taskâŚ) ;-)

19. If you’re trying to do this on an x86, and you’re using more than two instructions, you’ve already gone too far.

# %eax holds 32 bits, and the carry flag holds the 9th bit
rcrl \$1, %eax
# rotate %eax and the carry flag as a 33-bit number, right by one

20. @Cassy: No, that doesn’t work for a = 0x7FFFFFFF and b = 0x80000000. You have to check if there’s an overflow first, and if not (which would be the case for a = 0x7FFFFFFF and b = 0x80000000), do an arithmetic shift right.

21. @sheepmaster: :P I think you’re right. The problem there is that while the “shift” right is 33-bit, the addition wouldn’t be sign extended to 33-bits. *sigh* Well, just to keep my totally hackish untranslatable to C example going:

seto %bl
; %bl is either set to 1, or -1, either way the LSbit is 1
pushl %ebx
popf
; We put %ebx into the flags register
; Since CF is the LSbit, it now has the value of OF
rcrl \$1, %eax
; This works correctly for signed values now.

22. OK, I broke this into three cases:
– both positive (5,3), (0x7fffffff,0x7fffffff)
– one neg, one pos (0xfffffff,1), (0x80000000,0x10000000),(0x80000000,0x7fffffff),(0x80000000,1)
– both negative (0x8000000, 0x8000000), (0xfffffff,0xfffffff)

I think this works for all three cases: arithmetic shift right each number by 1, add, and add in bitwise OR of lowest bits in the original number.

mov edx,eax
or edx,ebx
and edx,1
asr eax,1
asr ebx,1

23. Oh yeah except asr=sar. Not the smallest code perhaps, but nice and simple.

24. Cassy: “seto bl” means set bl=01 if overflow, bl=00 if not.

25. @myria: from what I’ve heard, “SETcc” should not be trusted to set the register to just 1, because I’ve heard at times that it could possibly be set to -1 (all bits set)

Either way, my hack version still works, because the least significant bit is the one that gets moved into the CF, which is then moved into the register with the RCR. Note that all but 8-bits of the EFLAGS end up undefined from pushing EBX then popping it into EFLAGS.

Of course, my friend said he had to laugh at my solution, because it was just such an incredible hack. :) Of course, this is also the reason why I love assembly so much ;)

26. Cassy: Your code doesn’t work for the test case 0x7FFFFFFF, even after switching around eax and ebx in the first instruction.

27. I mean, 0x7FFFFFFF with 0x7FFFFFFF.

28. Matt Parks’s code appears correct… I have a modification of his that is 1 instruction shorter:

mov eax,ecx
or ecx,edx
sar eax,1
sar edx,1
shr ecx,1

This would need to be profiled against Matt’s code. It’s hard to tell which one will be better; the shr/adc can be expensive. It likely has terrible performance on Pentium 4’s, which hate “adc” and “sbb”.

My code is Visual C++’s __fastcall: parameters in ecx, edx; return value in eax; ecx, edx volatile.

29. Ick, stupid dest/src flipflopping :P

Hm… I’m working up a hack that actually works, but it’s taking me some time.

30. Write an optimal program in y86 assembly language for the following problem.

In a particular application, one of the key operational function takes in a
32 bit unsigned value as arguement. This value has 3 parts.

bits 0-15 -> value 1.
bits 17-31 -> value 2.
bit 16 -> operation.
=0 => value 1 + value 2
=1 => value 1 – value 2
Both values are unsigned values.
The function reads in the 32 bit value, performs the operation and
returns a 16bit value.
If the values are signed values, will there be any change in the code? if so, what are the changes to be made? if not, Justify.
If instead of bit 16, the operatoin bit is bit 15 (with value 1 = bit 0-14 and value 2 = bits 16-31), what are the changes to be made in the above code?

31. Hello! Good Site! Thanks you! jknjqjonwriuma

I modified the code from Goplat, http://forums.thedailywtf.com/forums/t/5470.aspx

setl cl
shr cl,1
rcr eax,1
setc cl // c is odd?
cdq // edx = -1 if r=0
adc eax,0 // inc if c is odd and r> 1); // arithmetic shift!
return (((int)(r<0)) & c & 1) + r;
}

33. My entry has been garbled, so I repost it in html…

I modified the code from Goplat, http://forums.thedailywtf.com/forums/t/5470.aspx

setl cl
shr cl,1
rcr eax,1
setc cl  // c is odd?
cdq  // edx = -1 if r<0, 0 if r>=0
adc eax,0  // inc if c is odd and r<0, i.e. cl=1 and dl=-1

or alternatively this one?:

mov ecx,eax
xor eax,edx
and edx,ecx
mov ecx,eax
sar eax,1
sets dl
and ecx,edx
and ecx,1

These functions want the parameters in eax and edx, give back the result in eax, and modifies only ecx.

int avg(int a, int b)
{
int c,r;
c = a ^ b;
r = (a & b) + (c >> 1);  // arithmetic shift!
return (((int)(r<0)) & c & 1) + r;
}

34. Goplat’s magic lies in the setl after the add command.
By replacing the shr cl,1 by neg cl, the first function gets even better (I replaced cl by dl as well):

setl dl
neg dl  // put “less” into carry, and set dl to -1 (r<0) or 0 (r>=0)
rcr eax,1
adc eax,0  // inc if c is odd and r<0, i.e. carry of rcr and dl=-1

Unfortunately all commands depend on each other, i.e. nothing can be done in parallel on a P4.

35. By the way: Myria’s (6. August 2006 at 22:43) and Matt Parks’ (3. August 2006 at 17:39) code fail for e.g. -1 and 2 where the result should be 0 instead of 1.

36. Uh oh, in the code above the add after the rcr should have been a adc. This error came by manually copying the code.

Here is the corrected code:
setl dl
neg dl
rcr eax,1

Here is yet another sequence, albeit only a gain on Core2:
setl dl
shrd eax,edx,1

37. Argh! The same mistake again!

Here is the corrected sequence:
setl dl
shrd eax,edx,1

38. I would add 0x80000000 to each input, making them unsigned, then add, shift right with carry to average them, and subtract 0x80000000 from the answer.

39. That would round toward negative infinity which is not what was wanted.
Example: -2 / -1 => -2 instead of -1

40. Here’s another method for the “round towards zero” challenge:

jae here1
cmc
here1: jnl here2
sbb eax,-1
here2: rcr eax,1

It’s one instruction more than Jaspers, but in my tests on an E2200 ‘dual core’, it’s significantly faster. It’s also shorter in real terms (12 bytes compared to 15), and it doesn’t destroy edx.

41. Your solution appears to be correct.
It is smaller and could even be the faster on most processors but it contains jumps and more commands.
The original problem specification is missing the goal which solution is considered “best”, i.e. smallest, least number of commands, or fastest (which processor?).

42. Regarding Mysoft’s code. jae is equivalent to jnc. So jae/cmc just ensures that the carry is reset. Therefore the code can be modified to:

clc
jnl here2
sub eax,-1
here2: rcr eax,1

Eliminating the first jump, and changing the sbb to sub, which in theory should be faster code. I ran a test changing it to al/dl (8 bit version) so I could quickly iterate over all possible signed byte values of a and b, and it appears to work fine.

I spent a while thinking about eliminating the second jump, I was able to do it in this way (8 bit test code again):

asm
mov al, [a]
clc
setl bl
neg bl
sub al, bl
rcr al, 1
mov [function], al
end asm

When extended to 32 bits, it will also require a xor ebx, ebx before usage. Mysoft believes this way will be slower on many CPUs, I didn’t run any tests on speed though.

43. I could find an even better variant of Pete Black’s code. The CLC instruction is superfluous, and DEC is one byte smaller than NEG (the SET instruction must be inverted):

xor ecx,ecx