{"id":23,"date":"2006-08-13T15:39:37","date_gmt":"2006-08-13T23:39:37","guid":{"rendered":"http:\/\/www.pagetable.com\/?p=23"},"modified":"2006-08-13T15:39:37","modified_gmt":"2006-08-13T23:39:37","slug":"how-to-divide-fast-by-immediates","status":"publish","type":"post","link":"https:\/\/www.pagetable.com\/?p=23","title":{"rendered":"How to divide fast by immediates"},"content":{"rendered":"<p>In almost all assembly books you&#8217;ll find some nice tricks to do fast multiplications. E.g. instead of &#8220;imul eax, ebx, 3&#8221; you can do &#8220;lea eax, [ebx+ebx*2]&#8221;\u00c2\u00a0(ignoring flag effects). It&#8217;s pretty clear how this works. But how can we speed up, say, a division by 3? This is quite important since division is still a really slow operation. If you never thought or heart about this problem before, get pen and paper and try a little bit. It&#8217;s an interesting problem.<\/p>\n<p><!--more-->The non-obvious trick here is: We can multiply by the inverse (i.e. the reciprocal). Maybe this sounds a little bit strange, since the inverse of 3 is what, 1\/3? Yes, it is, over the rational numbers. But here we calculate over the finite ring Z\/(2<sup>32<\/sup>)Z (for the non-mathematical readers: We wrap around after 2<sup>32<\/sup>). And in Z\/(2<sup>32<\/sup>)Z all odd elements have a multiplicative inverse. We can find the inverse with the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Extended_Euclidean_algorithm\">extended euclidean algorithm<\/a> (gcdex):<\/p>\n<p>For numbers a, b gcdex(a, b) returns [g, x, y] where<\/p>\n<ul>\n<li>g is the greatest common divisor<\/li>\n<li>x, y are some numbers,<\/li>\n<li>such that x*a + y*b = g<\/li>\n<\/ul>\n<p>E.g. gcdex(3, 2<sup>32<\/sup>) = [1, -1431655765, 1]<\/p>\n<blockquote><p>That means:<br \/>\n-1431655765 * 3 + 1*2<sup>32<\/sup> = 1<\/p><\/blockquote>\n<p>Since we are using the modulo 2<sup>32<\/sup> math (we have 32 bit registers), this is exactly what we need:<\/p>\n<blockquote><p>-1431655765*3 + 2<sup>32<\/sup> = 0xaaaa_aaab * 3 = 1 (mod 2<sup>32<\/sup>)<br \/>\nSo 1\/3 = 0xaaaa_aaab (mod 2<sup>32<\/sup>)<\/p><\/blockquote>\n<p>Now, let&#8217;s look what we have so far:<br \/>\nFor numbers divisible by 3 this is already our solution. We just have to multiply them by 0xaaaa_aaab, since this is the inverse of 3. But &#8212; in the general case &#8212; we want this to work with all values (the remainder should be ignored, but with our solution it destroys the result).<\/p>\n<p>We need the another trick: We look at the 64 bit result and count the &#8220;overflows&#8221;. This is somehow similar to fixed point arithmetic. Look at this:<\/p>\n<blockquote><p>3 * 0xaaaa_aaab = 0x2_0000_0001, and that means<br \/>\n(n*3) * 0xaaaa_aaab = n * 2<sup>33<\/sup> + n<\/p><\/blockquote>\n<p>This is cool, because<\/p>\n<blockquote><p>(n*3 + c) * 0xaaaa_aaab = (n*2<sup>33<\/sup> + n) + c*0xaaaa_aaab<\/p><\/blockquote>\n<p>So, with c in {0, 1, 2} we have the nice identity:<\/p>\n<blockquote><p>(n * 0xaaaa_aaab) >> 33 = n div 3, for all 0 <= n < 2<sup>32<\/sup><\/p><\/blockquote>\n<p>Last but not least, our final solution for dividing by 3 is:<\/p>\n<blockquote><p>[IN: eax dividend]<br \/>\nmov ecx, 0xaaaa_aaab <em>\/\/ 1\/3<br \/>\n<\/em>mul ecx <em>\/\/ multiply with inverse<br \/>\n<\/em>shr edx, 1 <em>\/\/ shift by 33<br \/>\n<\/em>[OUT: edx = eax div 3]<\/p><\/blockquote>\n<p>Of course, this is only half of the story; this is not the general solution for all divisors. And it doesn&#8217;t handle signed division. For futher reading see <a href=\"http:\/\/www.swox.com\/~tege\/divcnst-pldi94.pdf\">this paper<\/a> or this <a href=\"http:\/\/gcc.gnu.org\/bugzilla\/show_bug.cgi?id=28417\">gcc bug report<\/a>. It also has its chapter in the <a href=\"http:\/\/www.amd.com\/us-en\/assets\/content_type\/white_papers_and_tech_docs\/22007.pdf\">AMD optimization manual<\/a>.<\/p>\n<p>BTW: This is part of reason why you have to learn all that math stuff in computer science. It&#8217;s useful! \ud83d\ude42<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In almost all assembly books you&#8217;ll find some nice tricks to do fast multiplications. E.g. instead of &#8220;imul eax, ebx, 3&#8221; you can do &#8220;lea eax, [ebx+ebx*2]&#8221;\u00c2\u00a0(ignoring flag effects). It&#8217;s pretty clear how this works. But how can we speed up, say, a division by 3? This is quite important since division is still a &#8230; <a title=\"How to divide fast by immediates\" class=\"read-more\" href=\"https:\/\/www.pagetable.com\/?p=23\" aria-label=\"Read more about How to divide fast by immediates\">Read more<\/a><\/p>\n","protected":false},"author":3,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[32,38],"tags":[],"class_list":["post-23","post","type-post","status-publish","format-standard","hentry","category-tricks","category-x86"],"_links":{"self":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts\/23","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/users\/3"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=23"}],"version-history":[{"count":0,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts\/23\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=23"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=23"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=23"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}