Archive for the ‘archeology’ Category

Why is there no CR1 – and why are control registers such a mess anyway?

Friday, July 2nd, 2010

If you want to enable protected mode or paging on the i386/x86_64 architecture, you use CR0, which is short for control register 0. Makes sense. These are important system settings. But if you want to switch the pagetable format, you have to change a bit in CR4 (CR1 does not exist and CR2 and CR3 don’t hold control bits), if you want to switch to 64 bit mode, you have to change a bit in an MSR, oh, and if you want to turn on single stepping, that’s actually in your FLAGS. Also, have I mentioned that CR5 through CR15 don’t exist – except for CR8, of course?

Like many (but unfortunately not all) quirks of the i386/x86_64 architecture, this mess can be explained with history.

8086 – FLAGS

x86 history typically starts with the 16 bit 8086, but although it was not binary compatible with its predecessor, it was nevertheless a rather straightforward assembly-level compatible 16 bit extension of the 8 bit Intel 8080 with some ideas of the Zilog Z80. The 8086 is still a classic “home computer class” CPU, which was not meant for modern operating systems: It had no MMU of any kind, and no concept of privileged and unpriviliged modes. Therefore, control bits that we see as system state today were encoded into the 16 bit FLAGS register: The interrupt enable bit and the trap flag (which will cause a software interrupt after the next instruction and thus lets you single-step) are encoded into FLAGS right next to the ALU’s flags like Zero and Carry.

80286 – Machine Status Word

The 80286 then came with a simple form of memory management that allowed more sophisticated (but not yet “modern”) operating systems to run – like the original versions of OS/2. The 16 bit “Machine Status Word” was created to host the big switch between legacy mode (real mode) and the new memory-managed mode (protected mode) and a program could access it using the new instructions “lmsw” and “smsw”. The 80286 had more system state than just this bit: The GDT, the IDT and the TSS had its own registers and dedicated instructions to access them (”lgdt”/”sgdt”, “lidt”/”sidt”, “ltr”/”str”)

i386 – Control Registers

The i386 finally had a real MMU that allowed paging and thus modern operating systems. The MMU required two more registers in the system state, one for the base address of the pagetables, and one to read a fault address from. Intel decided against adding more special purpose registers with dedicated accessor instructions, but instead introduced eight indexed 32 bit wide “control registers” CR0 to CR7. The new accessors “mov crn, r32“/”mov r32, crn” allowed copying between registers and control registers and had the 3 bit CR index encoded in the opcode.

The old MSW was also wired into the lower 16 bits of CR0; but CR0 was also extended with new bits like the switch to turn on paging. CR1 was kept reserved, presumably as a second control register for miscellaneous control bits, and CR2 and CR3 were used for the aforementioned fault address and pagetable base pointer. The opcodes to access reserved control registers generated an “invalid opcode” fault, making it possible for Intel to reuse the opcodes later if they don’t use the control registers.

i486 – CR4

The i486 added a few more control bits, and some of them went into CR0. But instead of overflowing the new bits into CR1, Intel decided to skip it and open up CR4 instead – for unknown reasons.

Pentium – MSRs

On the Pentium, Intel added for the first time control bits that were a property of the implementation as opposed to the architecture, i.e. bits that are microarchitecture-specific and will therefore only work on certain CPUs and not necessarily be supported on later CPUs – like caching details and debug settings. In order not to waste the valuable CR space with throw-away control bits, Intel introduced the Model-specific Registers (MSRs). The MSR address space is 32 bits, and every MSR is 64 bits wide. The two new instructions “rdmsr” and “wrmsr” copy between an EDX-indexed MSR and the EAX register.

Pentium II – SYSENTER MSR

The SYSENTER instruction that got introduced on the on the Pentium II is a fast way to switch between unprivileged and privileged mode. Instead of looking up the destination segment, instruction pointer and stack pointer in memory, the CPU holds this information in three special-purpose system registers. CR space is valuable, so Intel decided against filling up CR5, CR6 and CR7, so they put it into the MSR address space instead – at 0×174 through 0×176. This was practically an abuse of the MSR concept.

AMD K6 – EFER MSR

Who can blame AMD for doing similar things then? With the K6, which was introduced at the same time as the Pentium II, AMD diverged from just copying Intel for the first time and actually added features of their own: They added the SYSCALL instruction, and with it, a control bit that turns it on and off, and an extra control register with the target location. Being afraid to collide with Intel extensions they they didn’t know about, they put the extra system registers into the MSR space: the control register “EFER” (Extended Feature Enable Register) at 0xC000_0080 and the Syscall Target Register (STAR) at 0xC000_0081. Intel had been nicely lining up MSRs counting up from 0, so AMD decided to start counting at 0xC000_0080. Understandable as this is, it is basically the same abuse of the MSR concept as Intel’s with SYSENTER.

A very similar thing happened in the CPUID space, by the way: While Intel encoded all its feature bits in leaf 0×0000_0001, AMD defined leaf 0×8000_0001 for its features.

x86_64 – Chaos!

So far everything looked like it was getting a little more controlled. Both Intel and AMD are only adding new control registers in the MSR space, and since this is a big address space and AMD and Intel extend it on rather opposite locations, it all looks nicer. But then came x86_64: For the first time, Intel was copying a feature that AMD introduced, and it needed to be compatible with all its details. AMD had encoded the availibility of x86_64 in its own CPUID leaf in 0×8000_0001, so Intel had to support this leaf as well. And since Long Mode was turned on in the EFER MSR, Intel had to support an MSR in the AMD space of 0xC000_0000. Long mode also required supporting SYSCALL, so Intel also supported the STAR MSR.

Since x86_64 introduced the REX prefix to double the number of available general-purpose registers, AMD decided to allow this prefix also for “mov cr”, doubling the number of control registers and therefore introducing CR8 through CR15 – also doubling their width. And since AMD introduced them, they owned them, and decided to use CR8 for the “Task Priority Register” feature.

VMX and SVM

The architecture is messy, sure, but does it matter? Maybe not… as long as CPUs didn’t have virtualization extensions! Both Intel VMX and AMD SVM are designed so that they can automatically switch the complete privileged machine state including control registers and certain MSRs. Intel for example special cases CR0, CR3, CR4 and CR8, leaves CR2 to the user. AMD on the other hand has 16 fields for all CRs in its switcher. And because of the two different starting points of the MSR space, Intel VMX required a whitelist bitmap for 8192 MSRs starting at 0×0000_0000 and for another 8192 MSRs starting at 0xC000_0000 – and of course SYSENTER_CS, EFER, STAR and friends are special-cased. If you want to have a lot of fun, read the VMCS layout reference of Intel’s manual 3B!

Future?

  • CR1 and CR5 to CR7 are still “owned” by Intel. AMD has shown that they don’t want to use them – and even Intel has not added a control register since 1989.
  • CR9 through CR15 are technically owned by AMD, since they introduced them with x86_64 and decided to use CR8. Intel adopted the reserved ones when adopting x86_64, but it is unlikely that Intel will ever adopt smaller changes to the architecture from AMD, and AMD is unlikely to use them if they won’t be part of the architecture, so these will probably never be used either. On the other hand, AMD added these to the auto-switcher list of their SVM Virtual Machine Control Block (VMCB), showing that they haven’t given up on them yet.
  • The MSR space is properly de-facto partitioned. Intel continues adding MSRs at 0 and AMD at 0xC000_0000 – but MSR have already lost their model-specificness in 1997. MSRs are the new CRs.

Dear Intel, dear AMD: I like the control registers, and I hate to see them wasted. Why don’t you finally define CR1 and give it a few control bits in the future? If you’re scared about collisions, I will be happy to be the arbiter. Ah, whatever: Intel, you get to define all even bits in CR1, and AMD, you get to define all odd bits. Okay? Cool.

High-Res Pictures of a MOS KIM-1

Monday, June 28th, 2010

The MOS KIM-1 is a quite rare collector’s item today. So if you hold one in your hands, you better take some high resolution pictures of the board. Here they are:

Note that this is the original revision of the board (pre-Rev A), and the 6502 CPU is from week 51 of the year 1975 – so it has the ROR bug!

Does anyone know what the three digit numbers 002 and 003 on the 6530 RIOTs mean? Are these the indexes of the ROM images? If so, what is ROM #001 and was there a #000? Also, the back has the number “0372″ on it – is this a serial number? Looking at the dates of the chips, this seems to be the oldest KIM-1 of all those I could find on the internet.

Who invented the computer?

Tuesday, April 20th, 2010
  • In 1837, Charles Babbage designed a general purpose computer, the Analytical Engine, but never built it.
  • Between 1934 and 1937, Church, Turing et al. defined the general purpose computer, but didn’t design one.
  • In 1941, Konrad Zuse built the first general purpose computer, the Z3, but didn’t know it was general purpose and didn’t use it that way.
  • From 1943 to 1946, Mauchly and Eckert finally built a computer, ENIAC, that was designed to be general-purpose.

Pictures of Apple Lisa 2 Boards

Tuesday, September 1st, 2009

The Apple Lisa is a quite rare collector’s item today. So if you hold one in your hands, you better take some high resolution pictures of the boards. Here they are:

CPU Board

I/O Board

Memory Board

Amiga intern (PDF)

Tuesday, August 18th, 2009

(German) Die Qualität dieses Scans ist furchtbar, aber wenigstens ist die PDF durchsuchbar.

Dittrich, S., Gelfand, R., & Schemmel, J. (1988). Amiga intern. Düsseldorf: Data Becker. (PDF, 718 S., 11 MB)

DAS STEHT DRIN:

In der dritten überarbeiteten Auflage finden Sie alles von der Hardware über den Betriebssystemkern EXEC bis zum DOS, alle entscheidenden Informationen zum Amiga. Und zwar so verständlich, daß auch die Nicht-Profis unter Ihnen die Arbeitsweise des Amiga-Betriebssystems schnell verstehen werden.

Aus dem Inhalt:

  • Die Chips des Amiga (68000, CIA, Gary, Agnus, Denise, Paula)
  • Die Schnittstellen (Video, Audio, RGB, Centronics, seriell, Floppy, Gameport, Expansionsport, Zorro-Bus, Tastatur)
  • Programmierung der Hardware (Speicherbelegung, Interrupts, Copper, Blitter, Disk-Controller)
  • Strukturen des EXEC (Node, List, Libraries, Tasks)
  • Funktion des Multitasking (Task-Switching, Kommunikation zwischen Tasks, Exceptions, Traps, Semaphoren, Speicherverwaltung)
  • I/O-Handhabung beim Amiga durch Devices und I/O-Request Interrupt-Handhabung und Verwaltung der Ressources
  • RESET-feste Programme und Strukturen, Dokumentation der RESET-Routine Programmierung eigener Handler, Devices und Libraries
  • EXEC-Base (Dokumentation und Nutzung der Systemvariablen) DOS-Bibliothek (Funktionen, Parameter, Fehlermeldungen)
  • Disketten (Boot-Vorgang, Datenstrukturen, Programmaufbau) Fast-Filing-System auf Diskette und Festplatte
  • Programmstart, Parameter, Aufruf von CU und Workbench, detaillierte Beschreibung des internen Aufbaus der CU-Befehle (Interne DOS-Bibliothek) Ein-/Ausgaben (Tastatur, Bildschirm, Diskette, parallele und serielle Schnittstelle)

UND GESCHRIEBEN HABEN DIESES BUCH:

Johannes Schemmel ist Hardware-Spezialist mit der Fähigkeit, Gesamtzusammenhänge verständlich darzustellen. Stefan Dittrich als 68000-Spezialist hat schon mit seinem Buch “Amiga Maschinensprache” dem interessierten AmigaAnwender gezeigt, welche Fähigkeiten in dem Rechner stecken. Ralf Gelfand ist ausgefuchster Amiga-Programmierer, der spätestens seit dem großen Floppy-Buch zum Amiga ein Begriff ist.

ISBN 3-89011-104-1

LOAD”$”,8

Tuesday, July 28th, 2009

Commodore computers up to BASIC 2.0 (like the Commodore 64, the VIC-20 and the PET 2001) only had a very basic understanding of mass storage: There were physical device numbers that were mapped to the different busses, and the “KERNAL” library had “open”, “read”, “write” and “close” functions that worked on these devices. There were also higher-level “load” and “save” functions that could load and save arbitrary regions of memory: The first two bytes of the file would be the (little endian) start address of the memory block.

With no special knowledge of “block storage” devices like disk drives, BASIC 2.0, which was not only a programming laguage but basically the shell of Commodore computers, could not express commands like “format a disk”, “delete a file” or “show the directory”. All this functionality, as well as the file system implementation, was part of the firmware of the disk drives.

Sending a Command

Sending commands to the drive was done by using the “open” call with a “secondary address” of 15: The computer’s KERNAL just sent the file name and the secondary address over the IEC bus as if it were to open a file, but the floppy drive understood secondary address 15 as the command channel. So for example, deleting a file from BASIC looked like this:

OPEN 1,8,15,"S:FOO": CLOSE 1

“1″ is the KERNAL’s file descriptor, “8″ the device number and “15″ the secondary address. Experts omitted the close, because it blocked on the completion of the operation.

Getting Data Back

While the “OPEN” line for disk commands was pretty verbose, it was still doable. Getting the error message of the last operation back was more tricky: It required a loop in BASIC that read bytes from channel 15 until EOF was reached.

Getting a directory listing would be in the same class of problem, since it requires the computer to send a command (and a file name mask) to the floppy and receive the data. Neither BASIC nor KERNAL knew how to do this, and since this was such a common operation, it wouldn’t have been possible to have the user type in a 4 line BASIC program just to dump the directory contents.

The BASIC Program Hack

Here comes the trick: If the program to load started with a “$” (followed by an optional mask), the floppy drive just returned the directory listing – formatted as a BASIC program. The user could then just “LOAD” the directory and “LIST” it if it were a BASIC program:

LOAD"$",8

SEARCHING FOR $
LOADING
READY.
LIST

0 "TEST DISK       " 23 2A
20   "FOO"               PRG
3    "BAR"               PRG
641 BLOCKS FREE.

In this example, “TEST DISK” is the disk name, “23″ the disk ID and “2A” the filesystem format/version (always 2A on 1540/1541/1551/1570/1571 – but this was only a redundant copy of the version information which was never read and could be changed). There are two files, 20 and 3 blocks in size respecively (a block is a 256 byte allocation unit on disk – since blocks are stored as linked lists there are only 254 bytes of payload), and both are of the “PRG” type.

Encoding of Commodore BASIC Programs

The floppy was aware of the encoding that Commodore BASIC (a derivative of Microsoft BASIC for 6502) used and prepared the directory listing in that way. A BASIC program in memory is a linked list of lines. Every line starts with a 2-byte pointer to the next line. A 0-pointer marks the end of the program. The next two bytes are the line number, followed by the zero-terminated encoded line contents.

The LIST command decodes a BASIC program in memory by following the linked list from the start of BASIC RAM. It prints the line number, a space, and the line contents. These contents have BASIC keywords encoded as 1-byte tokens starting at 0×80. Character below 0×80 are printed verbatim. Here is what 10 PRINT"HELLO WORLD!" would look like:

0801  0E 08    - next line starts at 0x080E
0803  0A 00    - line number 10
0805  99       - token for PRINT
0806  "HELLO!" - ASCII text of line
080D  00       - end of line
080E  00 00    - end of program

The example directory listing from above would be encoded by the floppy like this:

0801  21 08    - next line starts at 0x0821
0803  00 00    - line number 0
0805  '"TEST DISK       " 23 2A '
0820  00       - end of line
0821  21 08    - next line starts at 0x0821
0823  14 00    - line number 20
0825  '  "FOO"               PRG '
0840  00       - end of line
[...]

A couple of things are interesting here:

  • The line with the disk name and the ID is actually printed in inverted letters, which is done by having the “revert” character code as the first character of the first line, i.e. the floppy makes the assumption that the computer understands this convention.
  • BASIC will print the file sizes as variable-with line numbers, so the floppy adds extra spaces to the beginning of the line contents to have all file names aligned.
  • The floppy needs to populate the next line pointers for the linked list.

The Link Pointer

The obvious question here is: How can the floppy know where in the computer’s memory the BASIC program will live? The answer is: It doesn’t. The BASIC interpreter supports having its program anywhere in memory, and loading programs that were saved from other locations on memory – or possibly other Microsoft BASIC compatible computers with a different memory layout. The VIC-20 had BASIC RAM at 0×0401, the C64 at 0×0801 and the C128 at 0×1C01. Therefore, BASIC “rebinds” a program on load, searching for the zero-terminator of the lines and filling the (redundant) link pointers.

The floppy therefore only has to send non-zero values as the link pointers for BASIC to accept the directory listing as a program. In fact, a 1541 sends the directory with a 0×0401-base, which would be valid on a VIC-20. The reason for this is that the 1541 is only a 1540 with minor timing fixes for C64 support, and the 1540 is the floppy drive that was designed for the VIC-20.

Therefore, if you do LOAD"$",8,1 on a C64, the extra “,1″ will be interpreted by the KERNAL LOAD code to load the file at its original address (as opposed to the beginning of BASIC RAM), and since there is screen RAM at 0×0400 on the C64, garbage will appear on the screen, because the character encoding of screen ram is incompatible with BASIC character encoding.

Directory Code in 61 Bytes

There are two problems with this “directory listing is a BASIC program” hack: Listing the directory overwrites a BASIC program in RAM, and listing the directory from inside an application is non-trivial.

Therefore, many many implementations to show a directory listing exist on the C64 – and I want to present my own one here, which is, to my knowledge, the shortest existing (and maybe shorted possible?) version. It is based on a 70 byte version published in “64′er Magazin” some time in the 80s, and I managed to get it down to 61 bytes.

,C000:  A9 01     LDA #$01     ; filename length
,C002:  AA        TAX
,C003:  A0 E8     LDY #$E8     ; there is a "$" at $E801 in ROM
,C005:  20 BD FF  JSR $FFBD    ; set filename
,C008:  A9 60     LDA #$60
,C00A:  85 B9     STA $B9      ; set secondary address
,C00C:  20 D5 F3  JSR $F3D5    ; OPEN (IEC bus version)
,C00F:  20 19 F2  JSR $F219    ; set default input device
,C012:  A0 04     LDY #$04     ; skip 4 bytes (load address and link pointer)
,C014:  20 13 EE  JSR $EE13    ; read byte
,C017:  88        DEY
,C018:  D0 FA     BNE $C014    ; loop
,C01A:  A5 90     LDA $90
,C01C:  D0 19     BNE $C037    ; check end of file
,C01E:  20 13 EE  JSR $EE13    ; read byte (block count low)
,C021:  AA        TAX
,C022:  20 13 EE  JSR $EE13    ; read byte (block count high)
,C025:  20 CD BD  JSR $BDCD    ; print 16 bit integer
,C028:  20 13 EE  JSR $EE13    ; read character
,C02B:  20 D2 FF  JSR $FFD2    ; print character to stdout
,C02E:  D0 F8     BNE $C028    ; loop until zero
,C030:  20 D7 AA  JSR $AAD7    ; print carriage return character
,C033:  A0 02     LDY #$02
,C035:  D0 DD     BNE $C014    ; skip 2 bytes next time (link pointer)
,C037:  20 42 F6  JSR $F642    ; CLOSE
,C03A:  4C F3 F6  JMP $F6F3    ; reset default input device

(There is a similar implementation here.)

There are two limitations of this code though: It omits the extra space between the block number and the filename, leading to a slightly different output, and it cannot be interrupted.