Category Archives: trivia

Why is there no CR1 – and why are control registers such a mess anyway?

If you want to enable protected mode or paging on the i386/x86_64 architecture, you use CR0, which is short for control register 0. Makes sense. These are important system settings. But if you want to switch the pagetable format, you have to change a bit in CR4 (CR1 does not exist and CR2 and CR3 don’t hold control bits), if you want to switch to 64 bit mode, you have to change a bit in an MSR, oh, and if you want to turn on single stepping, that’s actually in your FLAGS. Also, have I mentioned that CR5 through CR15 don’t exist – except for CR8, of course?

Like many (but unfortunately not all) quirks of the i386/x86_64 architecture, this mess can be explained with history.

8086 – FLAGS

x86 history typically starts with the 16 bit 8086, but although it was not binary compatible with its predecessor, it was nevertheless a rather straightforward assembly-level compatible 16 bit extension of the 8 bit Intel 8080 with some ideas of the Zilog Z80. The 8086 is still a classic “home computer class” CPU, which was not meant for modern operating systems: It had no MMU of any kind, and no concept of privileged and unpriviliged modes. Therefore, control bits that we see as system state today were encoded into the 16 bit FLAGS register: The interrupt enable bit and the trap flag (which will cause a software interrupt after the next instruction and thus lets you single-step) are encoded into FLAGS right next to the ALU’s flags like Zero and Carry.

80286 – Machine Status Word

The 80286 then came with a simple form of memory management that allowed more sophisticated (but not yet “modern”) operating systems to run – like the original versions of OS/2. The 16 bit “Machine Status Word” was created to host the big switch between legacy mode (real mode) and the new memory-managed mode (protected mode) and a program could access it using the new instructions “lmsw” and “smsw”. The 80286 had more system state than just this bit: The GDT, the IDT and the TSS had its own registers and dedicated instructions to access them (“lgdt”/”sgdt”, “lidt”/”sidt”, “ltr”/”str”)

i386 – Control Registers

The i386 finally had a real MMU that allowed paging and thus modern operating systems. The MMU required two more registers in the system state, one for the base address of the pagetables, and one to read a fault address from. Intel decided against adding more special purpose registers with dedicated accessor instructions, but instead introduced eight indexed 32 bit wide “control registers” CR0 to CR7. The new accessors “mov crn, r32“/”mov r32, crn” allowed copying between registers and control registers and had the 3 bit CR index encoded in the opcode.

The old MSW was also wired into the lower 16 bits of CR0; but CR0 was also extended with new bits like the switch to turn on paging. CR1 was kept reserved, presumably as a second control register for miscellaneous control bits, and CR2 and CR3 were used for the aforementioned fault address and pagetable base pointer. The opcodes to access reserved control registers generated an “invalid opcode” fault, making it possible for Intel to reuse the opcodes later if they don’t use the control registers.

i486 – CR4

The i486 added a few more control bits, and some of them went into CR0. But instead of overflowing the new bits into CR1, Intel decided to skip it and open up CR4 instead – for unknown reasons.

Pentium – MSRs

On the Pentium, Intel added for the first time control bits that were a property of the implementation as opposed to the architecture, i.e. bits that are microarchitecture-specific and will therefore only work on certain CPUs and not necessarily be supported on later CPUs – like caching details and debug settings. In order not to waste the valuable CR space with throw-away control bits, Intel introduced the Model-specific Registers (MSRs). The MSR address space is 32 bits, and every MSR is 64 bits wide. The two new instructions “rdmsr” and “wrmsr” copy between an ECX-indexed MSR and the EDX:EAX registers.


The SYSENTER instruction that got introduced on the on the Pentium II is a fast way to switch between unprivileged and privileged mode. Instead of looking up the destination segment, instruction pointer and stack pointer in memory, the CPU holds this information in three special-purpose system registers. CR space is valuable, so Intel decided against filling up CR5, CR6 and CR7, so they put it into the MSR address space instead – at 0x174 through 0x176. This was practically an abuse of the MSR concept.


Who can blame AMD for doing similar things then? With the K6, which was introduced at the same time as the Pentium II, AMD diverged from just copying Intel for the first time and actually added features of their own: They added the SYSCALL instruction, and with it, a control bit that turns it on and off, and an extra control register with the target location. Being afraid to collide with Intel extensions they they didn’t know about, they put the extra system registers into the MSR space: the control register “EFER” (Extended Feature Enable Register) at 0xC000_0080 and the Syscall Target Register (STAR) at 0xC000_0081. Intel had been nicely lining up MSRs counting up from 0, so AMD decided to start counting at 0xC000_0080. Understandable as this is, it is basically the same abuse of the MSR concept as Intel’s with SYSENTER.

A very similar thing happened in the CPUID space, by the way: While Intel encoded all its feature bits in leaf 0x0000_0001, AMD defined leaf 0x8000_0001 for its features.

x86_64 – Chaos!

So far everything looked like it was getting a little more controlled. Both Intel and AMD are only adding new control registers in the MSR space, and since this is a big address space and AMD and Intel extend it on rather opposite locations, it all looks nicer. But then came x86_64: For the first time, Intel was copying a feature that AMD introduced, and it needed to be compatible with all its details. AMD had encoded the availibility of x86_64 in its own CPUID leaf in 0x8000_0001, so Intel had to support this leaf as well. And since Long Mode was turned on in the EFER MSR, Intel had to support an MSR in the AMD space of 0xC000_0000. Long mode also required supporting SYSCALL, so Intel also supported the STAR MSR.

Since x86_64 introduced the REX prefix to double the number of available general-purpose registers, AMD decided to allow this prefix also for “mov cr”, doubling the number of control registers and therefore introducing CR8 through CR15 – also doubling their width. And since AMD introduced them, they owned them, and decided to use CR8 for the “Task Priority Register” feature.


The architecture is messy, sure, but does it matter? Maybe not… as long as CPUs didn’t have virtualization extensions! Both Intel VMX and AMD SVM are designed so that they can automatically switch the complete privileged machine state including control registers and certain MSRs. Intel for example special cases CR0, CR3, CR4 and CR8, leaves CR2 to the user. AMD on the other hand has 16 fields for all CRs in its switcher. And because of the two different starting points of the MSR space, Intel VMX required a whitelist bitmap for 8192 MSRs starting at 0x0000_0000 and for another 8192 MSRs starting at 0xC000_0000 – and of course SYSENTER_CS, EFER, STAR and friends are special-cased. If you want to have a lot of fun, read the VMCS layout reference of Intel’s manual 3B!


  • CR1 and CR5 to CR7 are still “owned” by Intel. AMD has shown that they don’t want to use them – and even Intel has not added a control register since 1989.
  • CR9 through CR15 are technically owned by AMD, since they introduced them with x86_64 and decided to use CR8. Intel adopted the reserved ones when adopting x86_64, but it is unlikely that Intel will ever adopt smaller changes to the architecture from AMD, and AMD is unlikely to use them if they won’t be part of the architecture, so these will probably never be used either. On the other hand, AMD added these to the auto-switcher list of their SVM Virtual Machine Control Block (VMCB), showing that they haven’t given up on them yet.
  • The MSR space is properly de-facto partitioned. Intel continues adding MSRs at 0 and AMD at 0xC000_0000 – but MSR have already lost their model-specificness in 1997. MSRs are the new CRs.

Dear Intel, dear AMD: I like the control registers, and I hate to see them wasted. Why don’t you finally define CR1 and give it a few control bits in the future? If you’re scared about collisions, I will be happy to be the arbiter. Ah, whatever: Intel, you get to define all even bits in CR1, and AMD, you get to define all odd bits. Okay? Cool.

Who invented the computer?

  • In 1837, Charles Babbage designed a general purpose computer, the Analytical Engine, but never built it.
  • Between 1934 and 1937, Church, Turing et al. defined the general purpose computer, but didn’t design one.
  • In 1941, Konrad Zuse built the first general purpose computer, the Z3, but didn’t know it was general purpose and didn’t use it that way.
  • From 1943 to 1946, Mauchly and Eckert finally built a computer, ENIAC, that was designed to be general-purpose.

Having Fun with Branch Delay Slots

Branch Delay Slots are one of the awkward features of RISC architectures. RISC CPUs are pipelined by definition, so while the current instruction is in execution, the following instruction(s) will be in the pipeline already. If there is for example a conditional branch in the instruction stream, the CPU cannot know whether the next instruction is the one following the branch or the instruction at the target location until it has evaluated the branch. This would cause a bubble in the pipeline; therefore some RISC architectures have a branch delay slot: The instruction after the branch will always be executed, no matter whether the branch is taken or not.

So in practice, you can put the instruction that would be before the branch right after the branch, if this instruction is independent of the branch instruction, i.e. doesn’t access the same registers. Otherwise, you can fill it with a NOP. Out-of-order architectures can do this reordering at runtime, so there would be no need for a delay slot. Nevertheless, the delay slot is a feature of the architecture, not the implementation.

Some RISCs like PowerPC and ARM do not have a delay slot, but for example MIPS, SPARC, PA-RISC have it. But there are some variations: MIPS and PA-RISC have an annihilation/nullify/likely bit in the instruction, so the programmer can choose that the instruction in the delay slot only gets executed if the branch is taken.

Other CPUs, like SPARC, PA-RISC or the ill-fated Motorola M88K have optional delay slots: The programmer can set a bit in the opcode if he cannot come up with a good instruction for the delay slot, and save the wasted NOP in the program code this way – the CPU will put a bubble into the pipeline.

Now the interesting question is what happens if the branch and the delay instruction are not independent. What if the delay instruction writes r5 and the branch jumps to r5? What if it’s a branch-and-link, and the delay instruction modifies the link register? On MIPS, this is illegal, and undefined.

In practice, MIPS won’t halt and catch fire though. As you would expect from the design of a CPU pipeline, the CPU basically executes the branch and the delay instruction in order, as they are stored in the instruction stream, and it only delays the write to PC, i.e. the actual jump until after the delay instruction. So, for example, if you modify a register that the branch depends on, it will not influence the branch, but be in effect after the jump.

On the aforementioned Motorola M88K, this behaviour is documented, and GCC even makes use of it:

820:   7d ad 00 08   cmp   r13,r13,0x08    ; compare

824:   d4 6d 00 05   bb0.n 0x03,r13,0x834  ; cond. branch
828:   63 df 00 00   addu  r30,r31,0       ; delay slot

82c:   cc 00 00 7f   bsr.n 0xa28           ; function call
830:   60 21 01 ac   addu  r1,r1,0x1ac     ; delay slot: fix up link

834:   00 00 00 00   nop

The first three instructions are a compare/branch sequence. If the branch is taken, execution will continue at 0x834, otherwise at 0x82c, after the delay slot. The delay instruction is independent of the branch, nothing special here yet.

But now look at the following two instructions: 0x82c and 0x830 are not independent. r1 is the M88K’s link register, so the “bsr” writes the addres of the following instruction after the delay slot (0x834) into r1. The delay slot also writes into r1: It adds 0x1ac to it. These instructions are executed in order, but the actual branch to 0xa28 will only be done after the delay instruction.

So what these two instructions effectively do is call a function, but set up the return address to skip the next 0x1ac bytes (107 instructions) after return. If the conditional branch at 0x824 is taken, the code at 0x834 will be executed, otherwise, 0xa28 is called, and the “taken” case of the conditional branch is skipped. This trick can be used whenever you have C code with an “if” statement of which one case is a single function call:

if (...) {
} else {


Here is a new pagetable entry.

I like Intel. I told you before how Intel messed up the x86 register nomenclature by extending A to AX (A extended) and then to EAX (extended A extended). Then AMD came and extended the register once more, giving it a more sane name: RAX.

I also told you before how Intel messed up the x86 pagetable nomenclature: There were pagetables (PT, level 1) and page directories (PD, level 2) on the i386, and for the Pentium Pro, they added page directory pointers (PDP, level 3). Then AMD came and extended it once more, giving it a more sane name: page map level 4 (PML4).

With the advent of virtualization, both Intel and AMD added a feature to get rid of the slow software shadow pagetables, and added hardware support for nested pagetables, i.e. the guest has 4 levels of pagetables, and the host has another 4 levels.

AMD called these – surprise, surprise! – nested pagetables, NPT. Intel was more creative. With a history of extending architectures, they went with the big E: extended pagetables, EPT.

Let’s practice a bit: A PD is a page directory, a PDE is a page directory entry. You can also call it a PDPTE, a page directory pagetable entry (level 2 PTE), because after all, all these entries on all levels are PTEs, because they share the same format. A PDPPTE is a page directory pointer pagetable entry, aka level 3 entry.

If we use nested paging – excuse me – extended paging on Intel, we need to prepend EPT to our nice little abbreviations. An EPTPTE is a level 1 entry, an EPTPDPTE is level 2 not to be confused with an EPTPDPPTE, which is level 3, and a level 4 entry is EPTPML4PTE.

It get even better. Oracle/Sun/Innotek VirtualBox uses Hungarian Notation for its variable names, so it prepends “P” for pointer and “C” for constant. So what would you call a variable, which is a pointer to a constant level 2 EPT entry?

Of course, PCEPTPDPTE.

/** Pointer to a const EPT Page Directory Pointer Entry. */

I thought about this for a while, and considered patenting this brilliant idea of mine, but here it is, free of patents and free for everyone to use: Michael’s nomenclature for Intel/AMD pagetables:

new name description old name
P4 pagetable level 4 page PML4
P3 pagetable level 3 page PDP
P2 pagetable level 2 page PD
P1 pagetable level 1 page PT
P4E pagetable level 4 entry PML4E/PML4PTE
P3E pagetable level 3 entry PDPE/PDPPTE
P2E pagetable level 2 entry PDE/PDPTE
P1E pagetable level 1 entry PTE
NP4 nested pagetable level 4 page EPTPML4
NP3 nested pagetable level 3 page EPTPDP
NP2 nested pagetable level 2 page EPTPD
NP1 nested pagetable level 1 page EPTPT
NP4E nested pagetable level 4 entry EPTPML4E/EPTPML4PTE
NP3E nested pagetable level 3 entry EPTPDPE/EPTPDPPTE
NP2E nested pagetable level 2 entry EPTPDE/EPTPDPTE
NP1E nested pagetable level 1 entry EPTPTE

You are welcome.

The Easiest Way to Reset an i386/x86_64 System

Try this in kernel mode:

uint64_t null_idtr = 0;
asm("xor %%eax, %%eax; lidt %0; int3" :: "m" (null_idtr));

This can be quite helpful when doing operating system development on an i386/x86_64 system. You can use this for the regular restart case or when a kernel panic is supposed to restart immediately and you cannot make any assumptions on what is still working in the system.

You can also use this for debugging very low-level code if you don’t have a serial port or even an LED to report the most basic information: First make sure your code is reached by putting the reset code there. Then remove it again and put this code in:

if (condition)

The system will either hang or reset, depending on the condition.

Bill Gates' Personal Easter Eggs in 8 Bit BASIC

If you type “WAIT6502,1” into a Commodore PET with BASIC V2 (1979), it will show the string “MICROSOFT!” at the top left corner of the screen. Legend has it Bill Gates himself inserted this easter egg “after he had had an argument with Commodore founder Jack Tramiel”, “just in case Commodore ever tried to claim that the code wasn’t from Microsoft”.

In this episode of “Computer Archeology“, we will not only examine this story, but also track down the history of Microsoft BASIC on various computers, and see see how Microsoft added a second easter egg to the TSR-80 Color Computer – because they had forgotten about the first one.

Stolen From Apple?

This whole story sounds similar to Apple embedding a “Stolen From Apple” icon into the Macintosh firmware in 1983, so that in case a cloner copies the ROM, in court, Steve Jobs could hit a few keys on the clone, revealing the icon and proving that not just a “functional mechanism” was copied but instead the whole software was copied verbatim.

Altair BASIC

Let’s dig into the history of Microsoft’s BASIC interpreters. In 1975, Microsoft (back then still spelled “Micro-soft”) released Altair BASIC, a 4 KB BASIC interpreter for the Intel 8080-based MITS Altair 8800, which, despite all its other limitations, included a 32 bit floating point library.

An extended version (BASIC-80) that consisted of 8 KB of code contained extra instructions and functions, and, most importantly, support for strings.

Microsoft BASIC for the 6502

In 1976, MOS Technology launched the KIM-1, an evaluation board based around the new 6502 CPU from the same company. Microsoft converted their BASIC for the Intel 8080 to run on the 6502, keeping both the architecture of the interpreter and its data structures the same, and created two versions: an 8 KB version with a 32 bit floating point library (6 digits), and a 9 KB system with 40 bit floating point support (9 digits).

Some sources claim that, while BASIC for the 8080 was 8 KB in size, Microsoft just couldn’t fit BASIC 6502 into 8 KB, while other sources claim there was an 8KB version for the 6502. The truth is somewhere in the middle. The BASIC ROMs of the Ohio Scientific Model 500/600 (KIM-like microcomputer kits from 1977/1978) and the Compukit UK101 were indeed 8 KB in size, but unlike the 8080 version, it didn’t leave enough room for the machine-specific I/O code that had to be added by the OEM, so these machines required an extra ROM chip containing this I/O code.

In 1977, Microsoft changed the 6 digit floating point code to support 9 digits and included actual error stings instead of two-character codes, while leaving everything else unchanged. A 6502 machine with BASIC in ROM needed more than 8 KB anyway, why not make it a little bigger to add extra features. The 6 digit math code was still an assembly time option; the 1981 Atari Microsoft BASIC used that code.

In 1977, Ohio Scientific introduced the “Model 500”, which was the first machine to contain (6 digit) Microsoft BASIC 1.0 in ROM. Upon startup, it printed:


In the same year, MOS started selling a tape version of 9 digit Microsoft BASIC 1.1 for the KIM-1. Its start message was:


Woz Integer BASIC

The 1976 Apple I was the first system besides the KIM to use the MOS 6502 CPU, but Steve Wozniak wrote his own 4KB BASIC interpreter instead of licensing Microsoft’s. An enhanced version of Woz’ “Integer BASIC” came in the ROM of the Apple II in 1977; Microsoft BASIC (called “AppleSoft”) was available as an option on tape. On the Apple II Plus (1978), AppleSoft II replaced Integer BASIC.

Commodore PET

Commodore had bought MOS in October 1976 and worked on converting the KIM platform into a complete computer system. They licensed Microsoft BASIC for 6502 (also October 1976), renamed it to Commodore BASIC, replaced the “OK” prompt with “READY.”, stripped out the copyright string and shipped it in the ROMs of the first Commodore PET in 1977.

The Easter Egg

In 1979, Commodore started shipping update ROMs with a version 2 of Commodore BASIC for existing PETs. Apart from updates in array handling, it also contained the WAIT 6502 easter egg.

This is what the easter egg code looks like:

.,D710 20 C6 D6  JSR $D6C6      fetch address and value
.,D713 86 46     STX $46        save second parameter
.,D715 A2 00     LDX #$00       default for third parameter
.,D717 20 76 00  JSR $76        CHRGOT get last character
.,D71A F0 29     BEQ $D745      no third parameter
.,D71C 20 CC D6  JSR $D6CC      check for comma and fetch parameter
.,D71F 86 47     STX $47        save 3rd parameter
.,D721 A0 00     LDY #$00
.,D723 B1 11     LDA ($11),Y    read from WAIT address
.,D725 45 47     EOR $47        second parameter
.,D727 25 46     AND $46        first parameter
.,D729 F0 F8     BEQ $D723      keep waiting
.,D72B 60        RTS            back to interpreter loop

On pre-V2 BASIC, the branch at $D71A just skipped the next line: If there is no third parameter, don’t fetch it. On V2, the line is subtly changed to make the two-parameter case branch to a small patch routine:

.,D745 A5 11     LDA $11        low byte of address
.,D747 C9 66     CMP #$66       = low of $1966 (=6502)
.,D749 D0 D4     BNE $D71F      no, back to original code
.,D74B A5 12     LDA $12        high byte of address
.,D74D E9 19     SBC #$19       = high of $1966 (=6502)
.,D74F D0 CE     BNE $D71F      no, back to original code
.,D751 85 11     STA $11        low byte of screen buffer = 0
.,D753 A8        TAY            index = 0
.,D754 A9 80     LDA #$80       high byte of screen buffer
.,D756 85 12     STA $12        screen buffer := $8000
.,D758 A2 0A     LDX #$0A       10 characters
.,D75A BD 81 E0  LDA $E081,X    read character
.,D75D 29 3F     AND #$3F       throw away upper bits
.,D75F 91 11     STA ($11),Y    store into screen RAM
.,D761 C8        INY
.,D762 D0 02     BNE $D766      no carry
.,D764 E6 12     INC $12        increment screen buffer high address
.,D766 CA        DEX
.,D767 D0 F1     BNE $D75A      next character
.,D769 C6 46     DEC $46
.,D76B D0 EB     BNE $D758      repeat n times
.,D76D 60        RTS            back to interpreter loop

The text “MICROSOFT!” is stored in 10 consecutive bytes at $E082, cleverly hidden after a table of coefficients that is used for the SIN() function:

.;E063 05                        6 coefficients for SIN()
.;E064 84 E6 1A 2D 1B            -((2*PI)**11)/11! = -14.3813907
.;E069 86 28 07 FB F8             ((2*PI)**9)/9!   =  42.0077971
.;E06E 87 99 68 89 01            -((2*PI)**7)/7!   = -76.7041703
.;E073 87 23 35 DF E1             ((2*PI)**5)/5!   =  81.6052237
.;E078 86 A5 5D E7 28            -((2*PI)**3)/3!   = -41.3147021
.;E07D 83 49 0F DA A2               2*PI           =  6.28318531
.;E082 A1 54 46 8F 13            "SOFT!" | backwards and with
.;E087 8F 52 43 89 CD            "MICRO" | random upper bits

If we reverse the bytes, we get

CD 89 43 52 8F 13 8F 46 54 A1

The easter egg code clears the upper 2 bits, resulting in

0D 09 03 12 0F 13 0F 06 14 21

The easter egg code does not print the characters through library routines, but instead writes the values directly into screen RAM. While BASIC used the ASCII character encoding, the Commodore character set had its own encoding, with “A” starting at $01, but leaving digits and special characters at the same positions as in ASCII. Thus, the 10 hidden and obfuscated bytes decode into:


Microsoft’s Code?

Commodore engineers are known for putting easter eggs into ROM, but there would be no reason for them to encode the string “MICROSOFT!” and hide it so well. The “WAIT 6502” easter egg did not show up in Commodore BASIC until version 2, which is in contrast to almost all sources claiming Commodore licensed Microsoft BASIC for a flat fee and never returned to Microsoft for updates, but continued improving BASIC internally.

Commodore had indeed updated its source with Microsoft’s changes since V1. 6502 guru Jim Butterfield states:

Commodore paid Microsoft an additional fee to write a revision to the original BASIC that they had bought. Among other things, spaces-in-keywords were changed, zero page shifted around, and (unknown to Commodore) the WAIT 6502,x joke was inserted.

Targeting Commodore?

While all of Microsoft BASIC only depends on the CPU, makes no other assumptions on the hardware it runs on (be it Commodore, Apple, Atari, …), and does all its input and output by calling into ROM functions external to BASIC, the easter egg writes directly to screen RAM at a fixed address of $8000, and uses the PET character encoding for it: The easter egg has clearly been written specifically for the PET.

We can only speculate on the reasons why Microsoft and possibly Bill Gates himself added the easter egg. A possible reason is that Microsoft wanted to make sure Commodore cannot take credit for “Commodore BASIC” – similar to the “Stolen From Apple” case.

Or it was only about showing the world who really wrote it. Jim Butterfield: As an afterthought, Microsoft would have liked to see their name come up on the screen. But it wasn’t in the contract.

Commodore’s Reaction

The easter egg only exists in BASIC version 2 on the PET. All later Commodore computers didn’t contain it: The branch was restored and the extra code as well as the 10 bytes hidden after the SIN() coefficients were removed.

Jim Butterfield: Shortly after that implementation, I show this to Len Tramiel [of Commodore engineering] at the Commodore booth of a CES show. He was enraged: “We have a machine that’s short of memory space, and the #$#!* [Gates] put that kind of stuff in!!”

Commodore employee Andy Finkel states that the “Gates” (!) easter egg had to be removed for space reasons. It had occupied 51 extra bytes.

Interestingly, starting with the BASIC V7 on the C128 six years later, Commodore started crediting Microsoft, like this:

                (C)1977 MICROSOFT CORP.
                  ALL RIGHTS RESERVED

According to Jim Butterfield, this is probably due to negotiations concerning Microsoft BASIC for the Amiga.

The Easter Egg before the PET

But Microsoft did not encode its company name specifically for Commodore: The 9 digit BASIC 6502 version 1.1 for the KIM-1 contained the 10 hidden bytes:

.;3FAA 05                        6 coefficients for SIN()
.;3FAB 84 E6 1A 2D 1B            -((2*PI)^11)/11! = -14.3813907
.;3FB0 86 28 07 FB F8             ((2*PI)^9)/9!   =  42.0077971
.;3FB5 87 99 68 89 01            -((2*PI)^7)/7!   = -76.7041703
.;3FBA 87 23 35 DF E1             ((2*PI)^5)/5!   =  81.6052237
.;3FBF 86 A5 5D E7 28            -((2*PI)^3)/3!   = -41.3147021
.;3FC4 83 49 0F DA A2               2*PI           =  6.28318531
.;3FC9 A6 D3 C1 C8 D4            "!TFOS"
.;3FCE C8 D5 C4 CE CA            "ORCIM"

The extra bytes here are:

A6 D3 C1 C8 D4 C8 D5 C4 CE CA

If we XOR every byte with 0x87, we get:

21 54 46 4f 53 4f 52 43 49 4d

which, again, is “MICROSOFT!” backwards, but this time in the ASCII encoding. (Note that no XOR or add/sub can be found for the 10 bytes in Commodore BASIC that would convert them into ASCII instead of PETSCII. Also, thanks to Tom for his help here.)

The version of Microsoft BASIC for the 6502-based Apple II, called “AppleSoft“, contains the same 10 bytes after the coefficients in all tape and ROM versions. On AppleSoft II, for example, they are located at address $F075.

KIM-1 BASIC was released in 1977, AppleSoft II in spring 1978, and the V2 ROM of the PET in spring 1979. So Microsoft didn’t “target” Commodore with this at first, but probably put the data in for all their customers – possibly right after they had shipped the easteregg free V1 to Commodore. And when Commodore came back to them, they changed their codebase to encode string differently and added the easter egg code to show the string.

The Easter Egg after the PET

After the second source drop to Commodore, they removed the “WAIT6502” code again, but kept the 10 encoded bytes in their master codebase: Every non-Commodore post-1978 6502 Microsoft BASIC with the 40 bit floating point library contains the 10 encoded bytes after the SIN() coefficients – still in PET encoding:

  • Tangerine Microtan 65
  • Tangerine Oric-1 and Oric-Atmos
  • Pravetz 8D

This is a snippet from microtan/tanex_h2.rom:

0000fd8: 0f da a2 a1 54 46 8f 13  ....TF..
0000fe0: 8f 52 43 89 cd a5 d5 48  .RC....H

The ROM of the Ohio Scientific Superboard II (and its clone, the Compukit UK101) as well as the Atari Microsoft BASIC tape are based on the 32 bit floating point version and don’t contain the easter egg data.

“MICROSOFT!” on the 6800 and the 6809

It doesn’t stop there: Even the BASIC versions on the TRS-80 Color Computer and the TRS-80 MC-10, which were versions for the 6809 and 6800 CPU architectures, respectively (BASIC-69 and BASIC-68), had the encoded “MICROSOFT!” string after the SIN() coefficients. Here is a snippet of Spectral Associates’ disassembly of the CoCo ROM in his book “Color Basic Unravelled II

BFC7 05                LBFC7   FCB   6-1                   SIX COEFFICIENTS
BFC8 84 E6 1A 2D 1B    LBFC8   FCB   $84,$E6,$1A,$2D,$1B   * -((2*PI)**11)/11!
BFCD 86 28 07 FB F8    LBFC8   FCB   $86,$28,$07,$FB,$F8   *  ((2*PI)**9)/9!
BFD2 87 99 68 89 01    LBFD2   FCB   $87,$99,$68,$89,$01   * -((2*PI)**7)/7!
BFD7 87 23 35 DF E1    LBFD7   FCB   $87,$23,$35,$DF,$E1   *  ((2*PI)**5)/5!
BFDC 86 A5 5D E7 28    LBFDC   FCB   $86,$A5,$5D,$E7,$28   * -((2*PI)**3)/3!
BFE1 83 49 0F DA A2    LBFE1   FCB   $83,$49,$0F,$DA,$A2   *    2*PI

BFE6 A1 54 46 8F 13 8F LBFE6   FCB   $A1,$54,$46,$8F,$13   UNUSED GARBAGE BYTES
BFEC 52 43 89 CD               FCB   $8F,$52,$43,$89,$CD   UNUSED GARBAGE BYTES

You can tell that Microsoft didn’t reimplement BASIC for the remaining 8 bit architectures, but practically converted the 6502 code, copying all constants verbatim, even the ones they did not understand, since these are still the obfuscated bytes in PET-encoding.

A Second Easter Egg on the Color Computer

The TSR-80 Color Computer (1980) also has an easter egg in BASIC: If you type “CLS9” (or any higher number), it will clear the screen and print “MICROSOFT”.

Let’s see how it is done:

                  * CLS
A910 BD 01 A0     CLS     JSR RVEC22     HOOK INTO RAM
A913 27 13                BEQ LA928      BRANCH IF NO ARGUMENT
A918 C1 08                CMPB #8        VALID ARGUMENT?
A91A 22 1B                BHI LA937      IF ARGUMENT >8, GO PRINT ‘MICROSOFT’
A937 8D EF        LA937   BSR LA928      CLEAR SCREEN
A939 8E A1 65             LDX #LA166-1   *
A93C 7E B9 9C             JMP LB99C      * PRINT ‘MICROSOFT’

The string to be printed is stored here:

A166 4D 49 43 52 4F 53 LA166 FCC 'MICROSOFT'
A16C 4F 46 54
A16F 0D 00             LA16F FCB CR,$00

That’s right, Microsoft added a different easter egg, and included the string “MICROSOFT” again, this time in cleartext. They seem to have forgotten about the obfuscated 10 bytes intended for the PET that had been copied from the 6502 version to the 6800 during conversion, and had still been present in the Color Computer ROM.

The same easter egg exists on the 6800-based TRS-80 MC-10 (also 1980), which also had the 10 PET bytes in ROM:

FBBF 27 13                BEQ $FBD4       ; branch if no argument
FBC1 BD EF 0D             JSR $EF0D       ; get argument
FBC4 C1 08                CMPB #$08       ; easter egg?
FBC6 22 1D                BHI $FBE5       ; yes
FBE5 8D ED                BSR $FBD4       ; clear screen
FBE7 CE F8 33             LDX #$F834-1
FBEA 7E E7 A8             JMP $E7A8       ; print "MICROSOFT"
F834 4D 49 43 52 4F       FCC "MICROSOFT"
F834 53 4F 46 54 0D       FCB $0D
F834 00                   FCB $00
F724 A1 54 46 8F 13       FCB $A1,$54,$46,$8F,$13 ; "!TFOS"
F729 8F 52 43 89 CD       FCB $8F,$52,$43,$89,$CD ; "ORCIM"

Microsoft BASIC 6502 Timeline

  • Version 1.0 (in the 6 digit version) is used on the Ohio Scientific, and contains a major bug in the garbage collection code.
  • Version 1.0 (in the 9 digit version) is also used in the first Commodore PET as Commodore BASIC V1. It is the oldest known Microsoft BASIC to support 9 digit floating point.
  • Version 1.1, which contained bug fixes, is used on the KIM-1. It is the oldest version to contain the “MICROSOFT!” string (in ASCII).
  • AppleSoft BASIC I is forked from Microsoft BASIC 1.1. It contains the ASCII string.
  • Microsoft BASIC version 2 changes the ASCII string to PET screencode, adds the easter egg code, and is given to Commodore.
  • The code is removed again after the source drop to Commodore. The Tangerine Microtan is based on this.
  • Apple, Commodore and Tangerine continue development of their respective forks without the involvement of Microsoft.
  • The BASIC V2 used on the VIC-20 and the C64 is actually a stripped-down version of PET BASIC 4.0 and not a ported version of PET BASIC V2.

So did Bill Gates write it himself?

Altair BASIC was written by Bill Gates, Paul Allen (the founders of Microsoft) and Monte Davidoff (a contractor), as comments in the original source printout show:


Bill Gates wrote “the runtime stuff” (which probably means the implementation of the instructions), as opposed to “the non-runtime stuff” (probably meaning tokenization, memory management) and “the math package”. Consequently, the implementation of the WAIT command would have been his work – on the 8080, at least.

Now who wrote the 6502 version? The KIM-1 BASIC manual credits Gates, Allen and Davidoff, the original authors of the 8080 version, but it might only be left over from the manual for the 8080 version. Davidoff, who worked for Microsoft in the summers of 1975 and 1977, had not been there when BASIC 6502 was written in the summer of 1976, but he probably changed the 6 digit floating point code into the 9 digit version that is first found in BASIC 6502 1.1 (KIM-1, 1977).

The ROM of the 1977/1978 Ohio Superboard II Model 500/600 (6 digit BASIC 1.0) credits RICHARD W. WEILAND, and the 1977 9 digit KIM-1 BASIC 1.1 as well as the 1981 Atari Microsoft BASIC 2.7 credit “WEILAND & GATES”. Ric Weiland was the second Microsoft employee. These credits, again, were easter eggs: While they were clearly visible when looking at the ROM dump, they were only printed when the user entered “A” when BASIC asked for the memory size.

According to, Marc McDonald (employee number 1) wrote the 6502 version, but it is more likely that McDonald wrote the 6800 simulator and Weiland ported 8080 BASIC to the 6800 and then McDonald adapted the 6800 simulator to the 6502 and Weiland wrote the 6502 BASIC.

This and the hidden credits in version 1.0 of 6502 BASIC suggest that Weiland was the main author of 6502 BASIC. Gates is added to the hidden credits in the 1.1 version, so Gates probably contributed to the 1.1 update..

So it is very possible that Gates wrote the easter egg code himself, given that he was responsible for the implementation of WAIT on the 8080, he is credited in BASIC 6502 1.1+, Finkel and Butterfield refer to WAIT6502 as “Gates'” easter egg – and after all, he can write code.

Open Questions

  • What was Atari’s version based on? What versions were there? Atari Microsoft BASIC images are very hard to find.
  • Why did Atari use the 6 digit version, if they extended it with lots of commands (so size couldn’t have been an issue)?

Annotated Disassembly of Different Versions

How MOS 6502 Illegal Opcodes really work

The original NMOS version of the MOS 6502, used in computers like the Commodore 64, the Apple II and the Nintendo Entertainment System (NES), is well-known for its illegal opcodes: Out of 256 possible opcodes, 151 are defined by the architecture, but many of the remaining 105 undefined opcodes do useful things.

Many articles have been written to test and document these, but I am not aware of any article that tries to explain where exactly they come from. I’ll do this here.

The Block Diagram

Every 6502 data sheet comes with a block diagram, but these are of no use, because they are oversimplified, partially incorrect, and don’t explain how instruction decoding works. The following more detailed diagram is a lot more useful:

(Original from Apple II things)

The Decode ROM (PLA)

There is no need to understand the whole diagram. The important part is on the left: The instruction register, which holds the opcode, and the current clock cycle within the instruction (T0 to T6) get fed into a 130×21 bit decode ROM, i.e. a ROM with 130 lines of 21 bits each. On the die shot, this is the green area on the bottom.

(Original from Molecular Expressions)

While other CPUs from the same era used microcode to interpret the instruction, the 6502 had this 130×21 bit PLA. All lines of the PLA compare the instruction and the current clock cycle, and if they match, the line fires. A little simplified, every line looks like this:

ON bits OFF bits timing
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 T6 T5 T4 T3 T2 T1

(See the diagrams at for details; partial English translation of the website here).

  • “ON bits” specifies, which bits need to be set for this line to fire.
  • “OFF bits” specifies, which bits need to be clear for this line to fire.

The opcode table of the 6502 is laid out in a way that you can find easy rules to generalize the effects of similar opcodes. For example, the branch opcodes are encoded like this:


where “aa” is the condition (00=N, 01=V, 10=C, 11=Z) and “b” decides whether the branch is taken on a set or a clear flag.

So the following line would fire on the first cycle of any branch:

ON bits OFF bits timing
7 6 5 4 3 2 1 0 7 6 5 4 3 2 1 0 T6 T5 T4 T3 T2 T1
0 0 0 1 0 0 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 1

From now on, let’s write it differently, so that it’s more readable:

mask cycle description
XXX10000 T1 T1 of Bcc: fetch branch offset

If a line fires, it outputs a “1”. The “Random Control Logic” that can seen in the diagram then AND/OR-combines some lines and feeds the result into various components of the CPU: In the case of a branch, this would result in fetching the branch offset, for example.

One line can fire for several opcodes that are similar in their encoding and thus their behavior: For example, “LDA abs”, “ORA abs” and “AND abs” all do the same thing (fetch the low byte of the address) in T1, so there can be a line that matches all these opcodes and causes a memory fetch and a PC increment. Also, multiple lines can fire at the same time for any given cycle within an instruction, which will have the combined effect of the single lines.

LDA and LDX becomes LAX

Now there are many undefined opcodes. The designers of the 6502 have not created any specific PLA lines for them, but since their opcodes are similar to well-defined opcodes, there might be lines that fire nevertheless.

Let’s take opcode $AF for example, which is “LAX absolute”. It loads a value from an absolute address in memory and stores it in A and X at the same time. This is somewhat the combination of opcodes $AD (LDA abs) and $AE (LDX abs).

The instructions “LDA/LDX/LDY abs” ($AC/$AD/$AE) consist of four cycles:

  • The first cycle fetches the low byte of the address.
  • The second cycle fetches the hgh byte of the address.
  • The third cycle fetches the address from memory and stores it in A/X/Y.
  • The fourth cycle fetches the next instruction.

Cycles T1, T2 and T4 are identical for all three of them, and they are encoded smilarly, so the following three PLA lines can be used to detect these instructions and signal the rest of the CPU to carry out the specific tasks:

mask cycle description
101011XX T1 T1 of $AC/$AD/$AE: fetch addr/lo
101011XX T2 T2 of $AC/$AD/$AE: fetch addr/lo
101011XX T4 T4 of $AC/$AD/$AE: fetch next opcode

The mask %101011XX doesn’t only fire for $AC/$AD/$AE, but also for the undefined opcode $AF: So $AF (LAX) behaves the same as LDA/LDX/LDY in T1/T2/T4, i.e. it fetches a 16 bit address and in the end fetches the next opcode.

T3 differs in all three cases, so it has to be handled by one separate line per case:

mask cycle description
10101100 T3 T3 of $AC: read into Y
101011X1 T3 T3 of $AD: read into A
1010111X T3 T3 of $AE: read into X

(Actually, the lines in the actual PLA might be less specific, i.e. contain more X bits, since there are similar instructions like “ORA absolute” that might share this line.)

The line for $AC is only true for the exact value of $AC, but the $AD and $AE lines have one “don’t care” bit each. The bitfield of $AF, which is %10101111, is true for both masks, so in T3 of $AF, both the $AD and the $AE lines fire.

In T3, LDA/LDX/LDY have in common that they all read from memory and put the result onto the internal “SB” bus. “LDA” also sets the “SB->AC” control line to “1”, which will make the accumulator read its value from SB. Likewise, LDX causes “SB->X” to be “1” and makes X to read from the SB bus, and LDY reads SB into the Y register.

Since both the LDA and the LDX lines fire, both the accumulator and the X register will be sent the command to load their values from the SB bus, so $AF is effectively an LAX: Load Accumulator and X.

The KIL Opcodes

There are many “KIL” opcodes, i.e. opcodes that stop the CPU, so that it can only recover using a RESET, and not even an IRQ or an NMI.

In order to understand this, let’s look at the different states an instruction can be in. After the instruction fetch, the CPU is in cycle T1. It will feed the opcode and the cycle number into the PLA and cause the rest of the CPU to carry out whatever has to be done in this cycle, according to the PLA. Then it will shift the T bitfield left by one, so the T2 line will be “1”, then line T3 and so on. There are seven T lines total, T1 to T7. At the end of each instruction, the PLA causes the T bitfield to reset, so that the next instruction starts with T1=1 again.

But what happens if T does not get reset? This can happen if in all seven states of T, no line fires that actually belongs to an instruction that ends at this cycle. T gets shifted left until state T7, in which another shift left will just shift the 1 bit out of T – all bits of T will be zero then, so no PLA line can fire any more.

All interrupt and NMI requests are always delayed until the current instruction is finished, i.e. until T gets reset. But since T never gets reset, all interrupts and NMIs are effectively disabled.

What’s next?

There are many illegal opcodes, some with very weird behavior, and some that have been documented as unstable. Studying all these can reveal many interesting details about the internal design of the 6502.

Tool's 'Rosetta Stoned' Live Performance to Include Apple I Schematic

It probably takes a geek in the front row (which is likely for a concert in San Francisco) to notice that the computer schematics Tool showed on the screens during their “Rosetta Stoned” performance are those of an Apple 1 – but distorted:

It’s easy to read the mirrored text “MICROPROCESSOR”, and if you pay close attention, you can read “6502″ – a classic 8 bit CPU which has been featured in pop culture before.

The clipping shown during ‘Rosetta Stoned’ has been mirrored horizontally, and its width has been doubled. Here is the original schematic:

The schematic was shown in the background video in Los Angeles (10 Dec 2007) and San Francisco (11 Dec 2007), but IIRC not in Frankfurt (28 Aug 2007; I think I would have noticed), so it seems to be new in Tool’s Fall 2007 Tour.

But I forgot my pen…

64 bit is a lot!

When people talk about porting their applications to 64 bit, I sometimes hear them wonder how long it will be until they have to port everything to 128 bit – after all, the swiches from 8 to 16 bit (e.g. CP/M to DOS), 16 to 32 bit (DOS/Windows 3 to Windows 95/NT) and 32 to 64 have all happened in the last 25 years.

But all these switches don’t, even after Moore-compensation, don’t push the limit in a linear, but in an exponential way: 64 bit extends addressable memory by a factor of 4 billion. A database holding 2^64 bytes can store a 1 Megapixel JPEG of every square meter of the earth’s surface.

AMD and Intel understood that no CPU in the next few decades would need as much RAM (!), and therefore decided to, completely opaque for user space, only implement 48 bit addressing for now, saving 2 extra levels of page tables, and thus keeping TLB complexity lower. That’s about the size of New Jersey.

Game Development Archeology: Zelda on Game Boy comes with source

Imagine you’re writing a Game Boy game, and the resulting ROM with all the code and data is just a little over one megabyte in size. No big deal, just pad the game to two megabytes, and use a 2 MB ROM in the cartridge. Just tell the linker to allocate 2 MB or RAM, put the actual data at the beginning, and then write a 2 MB “.gb” image to disk, which will then be sent to the ROM chip factory.

Now imagine you’re doing this in MS-DOS. Your linker, probably written in C, calls malloc() of the runtime library of the C compiler. You already know where this is going?

While modern operating systems will always clear all malloc()ed memory, so that you cannot get to other processes’ data, this was uncommon in the single-user MS-DOS days. If you allocate 2 MB of RAM (the linker must have used a DOS extender or XMS), you’d get memory with random data in it: leftovers from whatever was in this memory before. (seppel tells me that this can also be caused by seek()ing over EOF in MS-DOS, in which case the previous data on the hard disk will be in the image.)

This is what happened with the 1998 Game Boy/Game Boy Color game “The Legend of Zelda – Link’s Awakening DX” (MD5: ee0424cf1523f67c5007566aed70696d). If you look at the image starting at 0x106000, you will find all kinds of interesting data, which will tell you a lot about the game’s development. Let’s call this “game development archeology”…

The ROM image includes big chunks of Borland’s Turbo C IDE (Turbo Vision interface) for DOS, as well as traces of the “QBasic” MS-DOS Editor. It is unclear which editor they used for what, but they might have used Turbo C to write DOS code to support building, as there is a complete copy of this C program in the ROM:


int main(void) {
	FILE *fp,*f1;
	int a=0xcd;
	int b=0xc6;
	int c=0x29;
	int ch;
	unsigned long i=0;

	if((fp=fopen("","rb"))==NULL) {
		printf("can't open the file");
		return 0;

	if((f1=fopen("ttmp.asm","wt"))==NULL) {
		printf("can't new file ttmp.asm");
		return 0;

	while((ch=fgetc(fp))!=EOF) {
		if(a==ch) {
			if(b==ch) {
					fprintf(f1,"%lXH, " , i);


This writes the file offsets at which 0xcd,0xc6,0x29 was found in the ROM image into ttmp.asm. These bytes, interpreted as Z80 machine code, mean “CALL 0x29C6”. In the final ROM image, this sequence appears once, at 0x442B. If you have any idea why they look for this, please post it in the comments.

This is the list of files in their project directory at D:GAMEBOY:

These filenames also appear in the ROM:

And here comes the interesting part: There is actually some assembly source in the ROM; here is a small snippet:

                 AND $02 ;LEFT
                 JR  NZ, JoyPort_2
                 CALL LEFTScroll
                 AND $04 ;UP
                 JR  NZ, JoyPort_3
                 CALL UPScroll
                 AND $08 ;DOWN
                 JR  NZ, JoyPort_4
                 CALL DOWNScroll

Well-documented, it seems. But there is also some assembly code that looks like this:

                LD A,$7F
                LD BC,$0800
                LD D,A
                LD HL,$9800
                LD A,D
                LD (HLI),A
                DEC BC
                LD  A,B
                OR  A,C
                JR  NZ,L_B000_2900
                LD  A,(HLI)
                LD  (DE),A
                INC DE
                DEC BC
                LD  A,B
                OR  A,C
                JR  NZ,L_B000_2914

The label names suggest that this code has been disassembled from existing Z80 machine code. Link’s Awakening DX is a color remake of an older Game Boy game, so it might very well be that they lost the original source, disassembled the old code and used it again for the remake. This could be easily proven by disassembling the original version and looking for this code.

If you want to do game development archeology yourself, you might want to look at titles like “X-Men – Wolverine’s Rage” (MD5: b1729716baaea01d4baa795db31800b0), which contains Windows 9x registry keys and INF files, “Mortal Kombat 4” (MD5: 7311f937a542baadf113e9115158cde3), in which you can find some small source fragments, “Gift” (MD5: e6a51088c8fea7980649064bd3a9f9ff), which will tell you that the developers had some Game Boy emulators installed on their system, or the “BIT-MANAGERS” games “Spirou” (MD5:5aa012cf540a5267d6adea6659764441, Turbo C, MAP file, source) and “TinTin in Tibet” (Game Boy Color version, MD5: 8150a3978211939d367f48ffcd49f979), which, amongst other things, contains references to Nintendo’s Game Boy Advance (!) SDK (“C:Cygnusthumbelf-000512H-i686-cygwin32libgcc-libthumb-elf2.9-arm-000512”, “/tantor/build/nintendo/arm-000512/i686-cygwin32/src/newlib/libc/stdio/stdio.c”).

If you find any more things like these, please post them (or links to your stories) as comments! Happy hacking!