Having Fun with Branch Delay Slots

Branch Delay Slots are one of the awkward features of RISC architectures. RISC CPUs are pipelined by definition, so while the current instruction is in execution, the following instruction(s) will be in the pipeline already. If there is for example a conditional branch in the instruction stream, the CPU cannot know whether the next instruction is the one following the branch or the instruction at the target location until it has evaluated the branch. This would cause a bubble in the pipeline; therefore some RISC architectures have a branch delay slot: The instruction after the branch will always be executed, no matter whether the branch is taken or not. read more

LOAD"$",8

Commodore computers up to BASIC 2.0 (like the Commodore 64, the VIC-20 and the PET 2001) only had a very basic understanding of mass storage: There were physical device numbers that were mapped to the different busses, and the “KERNAL” library had “open”, “read”, “write” and “close” functions that worked on these devices. There were also higher-level “load” and “save” functions that could load and save arbitrary regions of memory: The first two bytes of the file would be the (little endian) start address of the memory block. read more

How to divide fast by immediates

In almost all assembly books you’ll find some nice tricks to do fast multiplications. E.g. instead of “imul eax, ebx, 3” you can do “lea eax, [ebx+ebx*2]” (ignoring flag effects). It’s pretty clear how this works. But how can we speed up, say, a division by 3? This is quite important since division is still a really slow operation. If you never thought or heart about this problem before, get pen and paper and try a little bit. It’s an interesting problem. read more