{"id":1217,"date":"2006-04-01T17:16:25","date_gmt":"2006-04-02T01:16:25","guid":{"rendered":"http:\/\/www.pagetable.com\/?p=7"},"modified":"2006-04-01T17:16:25","modified_gmt":"2006-04-02T01:16:25","slug":"puzzle-powerpc-flag-simulation-on-x86","status":"publish","type":"post","link":"https:\/\/www.pagetable.com\/?p=1217","title":{"rendered":"Puzzle: PowerPC Flag Simulation on x86"},"content":{"rendered":"<p>This week&#8217;s puzzle is to copy\u00c2\u00a0the carry flag to the high bit of ah.\u00c2\u00a0 You may destroy any other register,\u00c2\u00a0the flags, and the other 24 bits of eax.\u00c2\u00a0 Shortest sequence wins.<\/p>\n<p>For an optional\u00c2\u00a0very long description of what this is all about, click &#8220;Read the rest&#8221;.\u00c2\u00a0 You don&#8217;t need to read it to participate in the contest =)<\/p>\n<p><!--more-->Let&#8217;s say that you are writing Rosetta for Apple to run PowerPC Mac programs on new x86-based hardware.\u00c2\u00a0 Naturally, speed is a huge goal, so you will use dynamic recompilation of PowerPC machine code to x86 to achieve this.\u00c2\u00a0 This is the basis of this week&#8217;s puzzle.<\/p>\n<p>Like x86, PowerPC has a flag register, called &#8220;cr0&#8221;.\u00c2\u00a0 (There are other flag registers but we can ignore them here.)\u00c2\u00a0 Because cr0 is frequently used by PowerPC code, you assign an x86 register to always represent the simulated cr0.\u00c2\u00a0 You cannot use the x86 flags register itself because many PowerPC can do instructions like &#8220;or&#8221; without changing its flags, while the x86 cannot.\u00c2\u00a0 When you translate PowerPC conditional jumps, you will need to get the bits of the simulate cr0 into the x86 flags register so that you may use an x86 conditional jump to simulate the PowerPC conditional jump.<\/p>\n<p>PowerPC &#8220;compare&#8221; instructions work much like x86, but there is\u00c2\u00a0a problem: On PowerPC, the compare instruction selects between signed and unsigned comparison, whereas on x86, the signed\/unsigned selection is done by the choice of conditional jump.\u00c2\u00a0 Because a &#8220;compare&#8221; instruction and a conditional branch may be arbitrarily far apart with many intervening instructions between them, it is not possible in the general case to use data flow analysis to decide how to translate it.\u00c2\u00a0 It must be simulated more exactly.<\/p>\n<p>Now comes the puzzle.\u00c2\u00a0 How do we recompile PowerPC compares and conditional branches into x86 code?\u00c2\u00a0 The &#8220;compare&#8221; needs to do one of two things.\u00c2\u00a0 One, it\u00c2\u00a0could set some flag such that when you do a conditional branch, you can know whether the previous comparison was signed or unsigned and decide whether to branch accordingly.\u00c2\u00a0 Two, it could set up the simulated cr0 register such that the branch did not need to know whether the branch was signed or unsigned.<\/p>\n<p>This is a complicated description of the problem, so I&#8217;m going to show an example solution.\u00c2\u00a0 We go with the second approach: we set up the simulated cr0 at compare time so that the branch does not need to know the type of comparison used earlier.\u00c2\u00a0 We define eax and edx to be scratch registers (that is, not directly mapped with simulated PowerPC registers).\u00c2\u00a0 We also define our simulated PowerPC cr0 (flags) register as x86 register ecx.\u00c2\u00a0 (A real implementation would probably put the flags into a memory variable rather than a register because cr0 is used less frequently than the more common registers.)<\/p>\n<p>We compare the value of esi, which we define as being a simulated PowerPC general register, to 5.\u00c2\u00a0 We then store bits in ecx to store the information necessary to implement the conditional jump later.\u00c2\u00a0 Because signed comparisons are more common than unsigned comparisons, we consider signed comparisons to be the &#8220;default&#8221;.\u00c2\u00a0 It is the responsibility of unsigned comparisons to set ecx to the same way that signed comparisons would set it.<\/p>\n<p>We use the x86 instructions &#8220;lahf&#8221; (flags -> ah) and &#8220;sahf&#8221; (ah -> flags) to access the low 8 bits of the x86 flags register, which is where we can find the interesting comparison results.\u00c2\u00a0 We cannot use pushfd\/popfd to do this, because they cause a kernel trap in some cases &#8211; they are extremely slow.<\/p>\n<p>The flags we care about are the &#8220;sign&#8221; and &#8220;zero&#8221; flags.\u00c2\u00a0 With them, we can do all comparisons: <, >, \u00e2\u0089\u00a4, \u00e2\u0089\u013d, and = can all be tested as some combination of these two flags.\u00c2\u00a0 Both these flags are in the low 8 bits of the flag register, and they can be tested against with a single conditional branch, so they make an optimal solution for the signed case.<\/p>\n<p>For a signed comparison, we do this:<\/p>\n<blockquote><p>cmp esi, 5 ; actual simulated comparison<br \/>\nlahf<br \/>\nmov ecx, eax<\/p><\/blockquote>\n<p>Later, when we want to branch based on a previous comparison&#8217;s result, we do this:<\/p>\n<blockquote><p>mov eax, ecx<br \/>\nsahf<br \/>\n(branch) label<\/p><\/blockquote>\n<p>(branch) is the type of condition to check:<\/p>\n<blockquote><p>< becomes jl\n> becomes jg<br \/>\n\u00e2\u0089\u00a4 becomes jle<br \/>\n\u00e2\u0089\u013d becomes jge<br \/>\n= becomes je<\/p><\/blockquote>\n<p>These branches all use the signed and zero flags directly to determine whether to branch.<\/p>\n<p>In the unsigned case, the x86 &#8220;carry&#8221; flag works for unsigned comparisons like the &#8220;sign&#8221; flag for signed comparisons.\u00c2\u00a0 By copying the &#8220;carry&#8221; flag to the &#8220;sign&#8221; flag, we allow the jl\/jg\/jle\/jge\/je branches to work for unsigned comparisons as well.\u00c2\u00a0 Carry is bit 0 of the x86 flags register, and sign is bit 7.\u00c2\u00a0 This is a simple way to implement unsigned comparisons:<\/p>\n<blockquote><p>cmp esi, 5<br \/>\nlahf<br \/>\nmov al, ah<br \/>\nshl al, 7<br \/>\nand ah, 0x7F<br \/>\nor ah, al<br \/>\nmov ecx, eax<\/p><\/blockquote>\n<p>\u00c2\u00a0<\/p>\n<p>Is there a faster way to implement this, especially for the unsigned case?<\/p>\n<p>Rules<\/p>\n<ul>\n<li>You may destroy any register.<\/li>\n<li>You may used fixed memory locations as variables.<\/li>\n<li>You may not use pushfd\/popfd because they are privileged instructions.<\/li>\n<li>You are not limited to improving the example; implementation of an entirely different solution to the same problem (such as to the first one suggested) are acceptable.<\/li>\n<li>Small improvements count a lot, because presumably the code will execute many millions of times.<\/li>\n<\/ul>\n<p>Michael Steil came up with the example solution shown here for his thesis work on SoftPear, and I only made a moderate enhancement to it.<\/p>\n<p>-Myria<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This week&#8217;s puzzle is to copy\u00c2\u00a0the carry flag to the high bit of ah.\u00c2\u00a0 You may destroy any other register,\u00c2\u00a0the flags, and the other 24 bits of eax.\u00c2\u00a0 Shortest sequence wins. For an optional\u00c2\u00a0very long description of what this is all about, click &#8220;Read the rest&#8221;.\u00c2\u00a0 You don&#8217;t need to read it to participate in &#8230; <a title=\"Puzzle: PowerPC Flag Simulation on x86\" class=\"read-more\" href=\"https:\/\/www.pagetable.com\/?p=1217\" aria-label=\"Read more about Puzzle: PowerPC Flag Simulation on x86\">Read more<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26,38],"tags":[],"class_list":["post-1217","post","type-post","status-publish","format-standard","hentry","category-puzzle","category-x86"],"_links":{"self":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts\/1217","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1217"}],"version-history":[{"count":0,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts\/1217\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1217"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1217"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1217"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}