In almost all assembly books you’ll find some nice tricks to do fast multiplications. E.g. instead of “imul eax, ebx, 3” you can do “lea eax, [ebx+ebx*2]”Â (ignoring flag effects). It’s pretty clear how this works. But how can we speed up, say, a division by 3? This is quite important since division is still a really slow operation. If you never thought or heart about this problem before, get pen and paper and try a little bit. It’s an interesting problem.
When I disassembled Steve Wozniak’s Apple I BASIC, I found a 6502 trick that I had never seen before, although I had read a lot of 6502 code, including the very well-written Commodore BASIC (i.e. Microsoft BASIC for 6502).
This audio file is an important (previously unreleased) artifact of computer history. The aim of the puzzle is to decode and identify it correctly.
When people talk about porting their applications to 64 bit, I sometimes hear them wonder how long it will be until they have to port everything to 128 bit – after all, the swiches from 8 to 16 bit (e.g. CP/M to DOS), 16 to 32 bit (DOS/Windows 3 to Windows 95/NT) and 32 to 64 have all happened in the last 25 years.
Imagine you’re writing a Game Boy game, and the resulting ROM with all the code and data is just a little over one megabyte in size. No big deal, just pad the game to two megabytes, and use a 2 MB ROM in the cartridge. Just tell the linker to allocate 2 MB or RAM, put the actual data at the beginning, and then write a 2 MB “.gb” image to disk, which will then be sent to the ROM chip factory.
pushl $(0xcb<<24)|0x08 call .-1
What does this instruction sequence do? (This was a collaborative effort by Chuck Gray, Myria and Michael.)
UNIX, Windows NT, and all the operating systems in their class rely on virtual memory, or paging, in order to provide every process on the system a complete address space of its own. An easier way to protect processes from each other is segmentation: The 4 GB address space of a 32 bit CPU is divided into segments (consisting of a physical base address and a limit), one for each process, and every process may only access their own segment. This is what the 286 did.
Intel used some strange opcodes for the SSE3 instructions. All MMX/SSE opcodes use the 0x0f prefix (former “pop cs”). They soon noticed the the 0x0f area gets full, so they used the 0x66, 0xf2, 0xf3 prefix as modifiers. The basic rule is:
Most of the x86 instructions will automatically alter the flags depending on the result. Sometimes this is rather frustrating because you actually what to preserve the flags as long as possible, and sometimes you miss a “mov eax, ecx” which alters the flags. But at least it’s guaranteed that an instruction either sets the flags or it doesn’t touch them, independent of the actual operation… Or is it?
As we all know the x86-ISA has a lot of redundant instructions (ie. instructions with the same semantic but different opcodes). Sometimes this is unavoidable, sometimes it looks like bad design. But with SSE it gets really weird. Let’s say we want to perform xmm0 <- xmm0 & xmm1 (ie. bitwise and). Not an uncommon operation; but we have 3 different ways do archive this:
- andps xmm0, xmm1 (0f 54 c1)
- andpd xmm0, xmm1 (66 0f 54 c1)
- pand xmm0, xmm1 (66 0f db c1)
(Note that andpd/pand are SSE2 instructions)
Regarding the result in xmm0 these are really the same instructions. Now, why did Intel do this? First we’re going to inspect andps/andpd. Looking at the optimization manuals we get a hint: The ps/pd mark the target register to contain singles or doubles, so they should match the actual data you are operating on.
Assigning internal version/family/model IDs to products is a non-trivial task, especially if there are several different families/architectures on your roadmap, and if the marketing names and target markets have no real correlation to the internal architecture.
In Win32, there is an API call called “MulDiv”:
Due to simplified instruction decoding of the Intel 80287, this CPU had opcode aliases for instructions like FXCH, FSTP, i.e. there were some additional encodings that did the same as the originals as defined by the 8087. As a side effect of this, a new instruction, FFREEP appeared, although not intented by Intel.
Virtualization means running one or more complete operating systems (at the same time) on one machine, possibly on top of another operating system. VMware, VirtualPC, Parallels etc. support, for example, running a complete GNU/Linux OS on top of Windows. For virtualization, the Virtual Machine Monitor (VMM) must be more powerful than kernel mode code of the guest: The guest’s kernel mode code must not be allowed to change the global state of the machine, but may not notice that its attempts fail, as it was designed for kernel mode. The VMM as the arbiter must be able to control the guest completely.