Reverse-Engineered GEOS 2.0 for C64 Source Code

The GEOS operating system managed to clone the Macintosh GUI on the Commodore 64, a computer with an 8 bit CPU and 64 KB of RAM. Based on Maciej Witkowiak's work, I created a reverse-engineered source version of the C64 GEOS 2.0 KERNAL for the cc65 compiler suite:

https://github.com/mist64/geos

  • The source compiles into the exact same binary as shipped with GEOS 2.0.
  • The source is well-structured and split up into 31 source files.
  • Machine-specific code is marked up.
  • Copy protection/trap mechanisms can be disabled.
  • The build system makes sure binary layout requirements are met.

This makes the source a great starting point for

  • adding (optional) optimized code paths or features
  • integrating existing patches from various sources
  • integrating versions for other computers
  • porting it to different 6502-based computers

Just fork the project and send pull requests!

Copy Protection Traps in GEOS for C64

Major GEOS applications on the Commodore 64 protect themselves from unauthorized duplication by keying themselves to the operating system's serial number. To avoid tampering with this mechanism, the system contains some elaborate traps, which will be discussed in this article.

GEOS Copy Protection

The GEOS boot disk protects itself with a complex copy protection scheme, which uses code uploaded to the disk drive to verify the authenticity of the boot disk. Berkeley Softworks, the creators of GEOS, found it necessary to also protect their applications like geoCalc and geoPublish from unauthorized duplication. Since these applications were running inside the GEOS "KERNAL" environment, which abstracted most hardware details away, these applications could not use the same kind of low-level tricks that the system was using to protect itself.

Serial Numbers for Protection

The solution was to use serial numbers. On the very first boot, the GEOS system created a 16 bit random number, the "serial number", and stored it in the KERNAL binary. (Since the system came with a "backup" boot disk, the system asked for that disk to be inserted, and stored the same serial in the backup's KERNAL.) Now whenever an application was run for the first time, it read the system's serial number and stored it in the application's binary. On subsequent runs, it read the system's serial number and compared it with the stored version. If the serial numbers didn't match, the application knew it was running on a different GEOS system than the first time – presumably as a copy on someone else's system: Since the boot disk could not be copied, two different people had to buy their own copies of GEOS, and different copies of GEOS had different serial numbers.

Serial Numbers in Practice

The code to verify the serial number usually looked something like this:

.,D5EF  20 D8 C1    JSR $C1D8 ; GetSerialNumber
.,D5F2  A5 03       LDA $03   ; read the hi byte
.,D5F4  CD 2F D8    CMP $D82F ; compare with stored version
.,D5F7  F0 03       BEQ $D5FC ; branch if equal
.,D5F9  EE 18 C2    INC $C218 ; sabotage LdDeskAcc syscall: increment vector
.,D5FC  A0 00       LDY #$00  ; ...

If the highest 8 bits of the serial don't match the value stored in the application's binary, it increments the pointer of the LdDeskAcc vector. This code was taken from the "DeskTop" file manager, which uses this subtle sabotage to make loading a "desk accessory" (a small helper program that can be run from within an application) unstable. Every time DeskTop gets loaded, the pointer gets incremented, and while LdDeskAcc might still work by coincidence the first few times (because it only skips a few instructions), it will break eventually. Other applications used different checks and sabotaged the system in different ways, but they all had in common that they called GetSerialNumber.

(DeskTop came with every GEOS system and didn't need any extra copy protection, but it checked the serial anyway to prevent users from permanantly changing their GEOS serial to match one specific pirated application.)

A Potential Generic Hack

The downside of this scheme is that all applications are protected the same way, and a single hack could potentially circumvent the protection of all applications.

A generic hack would change the system's GetSerialNumber implementation to return exactly the serial number expected by the application by reading the saved value from the application's binary. The address where the saved valus is stored is different for every application, so the hack could either analyze the instructions after the GetSerialNumber call to detect the address, or come with a small table that knows these addresses for all major applications.

GEOS supports auto-execute applications (file type $0E) that will be executed right after boot – this would be the perfect way to make this hack available at startup without patching the (encrypted) system files.

Trap 1: Preventing Changing the Vector

Such a hack would change the GetSerialNumber vector in the system call jump table to point to new code in some previously unused memory. But the GEOS KERNAL has some code to counter this:

                                ; (Y = $FF from the code before)
.,EE59  B9 98 C0    LDA $C098,Y ; read lo byte of GetSerialNumber vector
.,EE5C  18          CLC
.,EE5D  69 5A       ADC #$5A    ; add $5A
.,EE5F  99 38 C0    STA $C038,Y ; overwrite low byte GraphicsString vector

In the middle of code that deals with the menu bar and menus, it uses this obfuscated code to sabotage the GraphicsString system call if the GetSerialNumber vector was changed. If the GetSerialNumber vector is unchanged, these instructions are effectively a no-op: The lo byte of the system's GetSerialNumber vector ($F3) plus $5A equals the lo byte of the GraphicsString vector ($4D). But if the GetSerialNumber vector was changed, then GraphicsString will point to a random location and probably crash.

Berkely Softworks was cross-developing GEOS on UNIX machines with a toolchain that supported complex expressions, so they probably used code like this to express this:

    ; Y = $FF
    lda GetSerialNumber + 1 - $FF,y
    clc
    adc #<(_GraphicsString - _GetSerialNumber)
    sta GraphicsString + 1 - $FF,y

In fact, different variations of GEOS (like the GeoRAM version) were separate builds with different build time arguments, so because of different memory layouts, they were using different ADC values here.

Note that the pointers to the GetSerialNumber and GraphicsString have been obfuscated, so that an attacker that has detected the trashed GraphicsString vector won't be able to find the sabotage code by looking for the address.

Trap 2: Preventing Changing the Implementation

If the hack can't change the GetSerialNumber vector, it could put a JMP instruction at the beginning of the implementation to the new code. But the GEOS KERNAL counters this as well. The GetSerialNumber implementation looks like this:

.,CFF3  AD A7 9E    LDA $9EA7 ; load lo byte of serial
.,CFF6  85 02       STA $02   ; into return value (lo)
.,CFF8  AD A8 9E    LDA $9EA8 ; load hi byte of serial
.,CFFB  85 03       STA $03   ; into return value (hi)
.,CFFD  60          RTS       ; return

At the end of the system call function UseSystemFont, it does this:

.,E6C9  AD 2F D8    LDA $D82F ; read copy of hi byte of serial
.,E6CC  D0 06       BNE $E6D4 ; non-zero? done this before already
.,E6CE  20 F8 CF    JSR $CFF8 ; call second half of GetSerialNumber
.,E6D1  8D 2F D8    STA $D82F ; and store the hi byte in our copy
.,E6D4  60          RTS       ; ...

And in the middle of the system call function FindFTypes, it does this:

.,D5EB  A2 C1       LDX #$C1
.,D5ED  A9 96       LDA #$96  ; public GetSerialNumber vector ($C196)
.,D5EF  20 D8 C1    JSR $C1D8 ; "CallRoutine": call indirectly (obfuscation)
.,D5F2  A5 03       LDA $03   ; read hi byte of serial
.,D5F4  CD 2F D8    CMP $D82F ; compare with copy
.,D5F7  F0 03       BEQ $D5FC ; if identical, skip next instruction
.,D5F9  EE 18 C2    INC $C218 ; sabotage LdDeskAcc by incrementing its vector
.,D5FC  A0 00       LDY #$00  ; ...

So UseSystemFont makes a copy of the hi byte of the serial, and FindFTypes compares the copy with the serial – so what's the protection? The trick is that one path goes through the proper GetSerialNumber vector, while the other one calls into the bottom half of the original implementation. If the hack overwrites the first instruction of the implementation (or managed to disable the first trap and changed the system call vector directly), calling though the vector will reach the hack, while calling into the middle of the original implementation will still reach the original code. If the hack returns a different value than the original code, this will sabotage the system in a subtle way, by incrementing the LdDeskAcc system call vector.

Note that this code calls a KERNAL system function that will call GetSerialNumber indirectly, so the function pointer is split into two 8 bit loads and can't be found by just searching for the constant. Since the code in UseSystemFont doesn't call GetSerialNumber either, an attacker won't find a call to that function anywhere inside KERNAL.

Summary

I don't know whether anyone has ever created a generic serial number hack for GEOS, but it would have been a major effort – in a time before emulators allowed for memory watch points. An attacker would have needed to suspect the memory corruption, compared memory dumps before and after, and then found the two places that change code. The LdDeskAcc sabotage would have been easy to find, because it encodes the address as a constant, but the GraphicsString sabotage would have been nearly impossible to find, because the trap neither uses the verbatim GraphicsString address nor GetSerialNumber function.

Usual effective hacks were much more low-tech and instead "un-keyed" applications, i.e. removed the cached serial number from their binaries to revert them into their original, out-of-the-box state.

How Amica Paint protected tampering with its credits

In mid-1990, the floppy disk of special issue 55 of the German Commodore 64 magazine "64'er" contained the "Amica Paint" graphics program – which was broken beyond usefulness. I'll describe what went wrong.

"Amica Paint" was devloped by Oliver Stiller and first published in 64'er special issue 27 in 1988, as a type-in program that filled 25 pages of the magazine.

Two years later, Amica Paint was published again in special issue 55, which this time came with a floppy disk. But this version was completely broken: Just drawing a simple line would cause severe glitches.

Alt text

64'er issue 9/1990 published an erratum to fix Amica Paint, which described three ways (BASIC script, asm monitor and disk monitor) to patch 7 bytes in one of the executable files:

--- a/a.paint c000.txt
+++ b/a.paint c000.txt
@@ -67,8 +67,8 @@
 00000420  a5 19 85 ef a5 1a 85 f0  4c 29 c4 20 11 c8 20 6c  |........L). .. l|
 00000430  c8 08 20 ed c7 28 90 f6  60 01 38 00 20 20 41 4d  |.. ..(..`.8.  AM|
 00000440  49 43 41 20 50 41 49 4e  54 20 56 31 2e 34 20 20  |ICA PAINT V1.4  |
-00000450  01 38 03 20 20 20 4f 2e  53 54 49 4c 4c 45 52 20  |.8.   O.STILLER |
-00000460  31 39 39 30 20 20 20 01  31 00 58 3d 30 30 30 20  |1990   .1.X=000 |
+00000450  01 38 03 42 59 20 4f 2e  53 54 49 4c 4c 45 52 20  |.8.BY O.STILLER |
+00000460  31 39 38 36 2f 38 37 01  31 00 58 3d 30 30 30 20  |1986/87.1.X=000 |
 00000470  59 3d 30 30 30 20 20 20  20 20 20 20 20 20 00 01  |Y=000         ..|
 00000480  31 00 42 49 54 54 45 20  57 41 52 54 45 4e 20 2e  |1.BITTE WARTEN .|
 00000490  2e 2e 20 20 20 20 00 64  0a 01 00 53 43 48 57 41  |..    .d...SCHWA|

This changes the credits message from "O.STILLER 1990" to "BY O.STILLER 1986/87" – which is the original message from the previous publication.

64'er magazine had published the exact same application without any updates, but binary patched the credits message from "1986/87" to "1990", and unfortunately for them, Amica Paint contained code to detect exactly this kind of tampering:

.,C5F5  A0 14       LDY #$14     ; check 20 bytes
.,C5F7  A9 00       LDA #$00     ; init checksum with 0
.,C5F9  18          CLC
.,C5FA  88          DEY
.,C5FB  79 51 C4    ADC $C451,Y  ; add character from message
.,C5FE  88          DEY
.,C5FF  18          CLC
.,C600  10 F9       BPL $C5FB    ; loop
.,C602  EE FD C5    INC $C5FD
.,C605  C9 ED       CMP #$ED     ; checksum should be $ED
.,C607  F0 05       BEQ $C60E
.,C609  A9 A9       LDA #$A9
.,C60B  8D E4 C7    STA $C7E4    ; otherwise sabotage line drawing
.,C60E  60          RTS

The code checksums the message "BY O.STILLER 1986/87". If the checksum does not match, the code will overwrite an instruction in the following code:

.,C7DC  65 EC       ADC $EC
.,C7DE  85 EC       STA $EC
.,C7E0  90 02       BCC $C7E4
.,C7E2  E6 ED       INC $ED
.,C7E4  A4 DD       LDY $DD
.,C7E6  60          RTS

The "LDY $DD" instruction at $C7E4 will be overwritten with "LDA #$DD", which will cause the glitches in line drawing.

The proper fix would have been to change the comparison with $ED into a comparison with $4F, the checksum of the updated message – a single byte fix. But instead of properly debugging the issue, 64'er magazine published a patch to restore the original message, practically admitting that they had cheated by implying the re-release was not the exact same software.

Why does PETSCII have upper case and lower case reversed?

The PETSCII character encoding that is used on the Commodore 64 (and all other Commodore 8 bit computers) is similar to ASCII, but different: Uppercase and lowercase are swapped! Why is this?

The "PET 2001" from 1977 had a built-in character set of 128 characters. This would have been enough for the 96 printable ASCII characters, and an extra 32 graphical characters. But Commodore decided to replace the 26 lowercase characters with even more graphical characters. After all, the PET did not support a bitmapped display, so the only way to display graphics was using the graphical characters built into the character set. These consisted of symbols useful for box drawing as well as miscellaneous symbols (including the french deck suits).

So the first PET was basically using ASCII, but with missing lower case. This had an influence on the character codes produced by the keyboard:

Key ASCII keyboard PET keyboard
A a ($61) A ($41)
Shift + A A ($41) ♠ ($C1)

On a standard ASCII system, pressing "A" unshifted produces a lower case "a", and together with shift, it produces an uppercase "A". On a PET, because there is no lower case, pressing "A" unshifted produces an uppercase "A", and together with shift produces the "spade" symbol ("♠") from the PET-specific graphical characters.

Later Commodore 8 bit computers added a second character set that supported uppercase and lowercase. Commodore decided to allow the user to switch between the two character sets at any time (by pressing the "Commodore" and "Shift" keys together), so applications generally didn't know which character set was active. Therefore, the keyboard had to produce the same character code independent of the current character set:

key ASCII keyboard PET keyboard (upper/graph) PET keyboard (upper/lower)
A a ($61) A ($41) a ($41)
Shift + A A ($41) ♠ ($C1) A ($C1)

An unshifted "A" still produces a code of $41, but it has to be displayed as a lower case "a" if the upper/lower character set is enabled, so Commodore had to put lower case characters at the $40-$5F area – which in ASCII are occupied by the uppercase characters. A shifted "A" still produces a code of $C1, so Commodore put the uppercase characters into the $C0-$DF area.

Now that both upper case and lower case were supported, Commodore decided to map the previously undefined $60-$7F area to upper case as well:

range ASCII upper case PETSCII lower case PETSCII lower case PETSCII with fallback
$40-$5F upper case upper case lower case lower case
$60-$7F lower case undefined undefined upper case
$C0-$DF undefined graphical upper case upper case

If you only look at the area between $00 and $7F, you can see that PETSCII reverses upper and lower case compared to ASCII, but the $60-$7F area is only a compatibility fallback; upper case is actually in the $C0-$DF area – it's what the keyboard driver produces.

Macross 6502, an assembler for people who hate assembly language

There are many MOS 6502 cross-assemblers available. Here’s a new one. Or actually a very old one. “Macross”, a very powerful 6502 macro assembler, which was used to create Habitat, Maniac Mansion and Zak McKracken, was developed between 1984 and 1987 at Lucasfilm Ltd. and is now Open Source (MIT license):

https://github.com/Museum-of-Art-and-Digital-Entertainment/macross

Some History

Starting in 1984, a team at Lucasfilm Ltd. was developing one of the first online role-playing games, originally called “Microcosm”, which was released as “Habitat” in 1986 and later renamed to “Club Caribe”. The client ran on a Commodore 64, which conntected to the central server through the Quantum Link network.

The client software was developed on a 68K-based Sun workstation running the SunOS variant of Unix using cross-development tools developed by Chip Morningstar (who was also the Habitat lead): The “Macross” assembler and the “Slinky” linker. They were used on every 6502 (Atari 400/800, Commodore 64, and Apple II) game produced at Lucasfilm Games, from 1984 up until those machines ceased to be relevant to the games market*.

In 2014, The Museum of Art and Digital Entertainment got a hold of

  • the source of the original development tools (Macross/Slinky)
  • the source of the C64 client
  • the source of the server (written in PL/I)
  • lots of documentation and development logs

which originated from an archive created in 1987 in the context of the technology transfer to Fujitsu, which bought all Habitat assets.

Since Macross and Slinky were written for Unix, it was easy to get them compiling with modern compilers (K&R syntax notwithstanding) and running on a modern Unix system. At the time of writing, the code in the official repository has been fixed to compile with clang on OS X. Further fixes and cleanups are very welcome.

Compiling Macross

Enter “make” both in the toplevel directory and in the “slinky” directory, then copy “macross” and “slinky” into your path. There are man files in the “doc” directory that you may want to install as well.

Writing Code

The syntax of Macross source files is very different from modern 6502 cross assembler, and more similar to Commodore’s own “A65″ assembler. Here is a small “Hello World” for the C64:

define strout = 0xab1e

hello:
    lda #/text
    ldy #?text
    jmp strout

text:
    byte "HELLO WORLD!", 0

As you can see, hex values have to be specified in C notation (binary is prefixed with “0b”), and the operators to extract the low and high bytes of a 16 bit value are “/” and “?”, respectively.

Compile and link the source file like this:

macross -c -o hello.o hello.m
slinky -e -o hello.bin -n -m hello.sym -l 0xc000 hello.o
dd if=hello.bin bs=1 skip=2 count=2 of=hello.prg
dd if=hello.bin bs=1 skip=6 >> hello.prg

The “dd” lines convert Slinky’s output, which is a “standard a65-style object file” (which has a header of FF FF, followed by the start address, followed by the end address) into a C64 .PRG file that is only prefixed by the start address.

Here is a slightly more complex example:

define bsout = 0xffd2

hello:
    ldx #0
    do {
        lda x[text]
        cmp #'A'
        if (geq) {
            tay
            iny
            tya
        }
        inx
        jsr bsout
    } while (!zero)
    
    rts

text:
    byte "HELLO WORLD!", 0

Macross supports C-style if/else, while and do/while, as well as do/until, where the condition can be one of:

  • zero/equal
  • carry
  • lt/leq/gt/geq
  • slt/sleq/sgt/sgeq
  • positive/plus/negative/minus
  • overflow

…as well as their negated versions.

Also note that the “absolute, x-indexed” addressing mode has a different syntax than commonly used.

Macros

Macross has a very powerfull macro language. Here is an example:

org 0xc000

function makeFirstByte(operand) {
    mif (isImmediateMode(operand)) {
        freturn(/operand)
    } melse {
        freturn(operand)
    }
}

function makeSecondByte(operand) {
    mif (isImmediateMode(operand)) {
        freturn(?operand)
    } melse {
        freturn(operand + 1)
    }
}

macro movew src, dst {
    lda makeFirstByte(src) 
    sta makeFirstByte(dst)
    lda makeSecondByte(src)
    sta makeSecondByte(dst)
}

macro hook_vector index, new, dst {
    ldx #index * 2
    movew x[0x0300], dst
    movew #new, x[0x0300]
}

define VECTOR_INDEX_IRQ = 10

    hook_vector VECTOR_INDEX_IRQ, irq, return + 1
    rts

irq:
    inc 0xd020
return:
    jmp 0xffff

The “hook_vector” line will emit the following assembly code:

    ldx #$14
    lda $0300,x
    sta $C01D
    lda $0301,x
    sta $C01E
    lda #$19
    sta $0300,x
    lda #$C0
    sta $0301,x

(The example is a little contrived, since adding the index could have been done at assembly time, but the example nicely demonstrates that macros can preserve addressing modes.)

The file doc/macros.itr contains many useful macros. they are documented in doc/genmacros.itr.

Full Documentation

The complete documentation of Macross is available in the file doc/writeup_6502.itr in the repository. It is in troff format and can viewed like this:

nroff -ms doc/writeup_6502.itr

Future

Macross is a very different approach to 6502 development, and with the source available, I think it’s a viable project that should be continued.

I will happily accept pull requests for compile fixes (GCC, VS, …), cleanups (C99, converting docs from troff to markdown, …) and features (BIN and PRG output, support for more a modern notation, PETSCII, …).

xhyve – Lightweight Virtualization on OS X Based on bhyve

The Hypervisor.framework user mode virtualization API introduced in Mac OS X 10.10 (Yosemite) cannot only be used for toy projects like the hvdos DOS Emulator, but is full-featured enough to support a full virtualization solution that can for example run Linux.

xhyve is a lightweight virtualization solution for OS X that is capable of running Linux. It is a port of FreeBSD’s bhyve, a KVM+QEMU alternative written by Peter Grehan and Neel Natu.

  • super lightweight, only 230 KB in size
  • completely standalone, no dependencies
  • the only BSD-licensed virtualizer on OS X
  • does not require a kernel extension (bhyve’s kernel code was ported to user mode code calling into Hypervisor.framework)
  • multi-CPU support
  • networking support
  • can run off-the-shelf Linux distributions (and could be extended to run other operating systems)

xhyve may make a good solution for running Docker on your Mac, for instance.

Running Tiny Core Linux on xhyve

The xhyve repository already contains a small Linux system for testing, so you can try out xhyve by just typing these few lines:

$ git clone https://github.com/mist64/xhyve
$ cd xhyve
$ make
$ ./xhyverun.sh

And you will see Tiny Core Linux booting in your terminal window:

Initializing cgroup subsys cpuset
Initializing cgroup subsys cpu
Initializing cgroup subsys cpuacct
Linux version 3.16.6-tinycore64 (tc@box) (gcc version 4.9.1 (GCC) ) #777 SMP Thu Oct 16 10:21:00 UTC 2014
Command line: earlyprintk=serial console=ttyS0 acpi=off
e820: BIOS-provided physical RAM map:
BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
BIOS-e820: [mem 0x0000000000100000-0x000000003fffffff] usable
NX (Execute Disable) protection: active
SMBIOS 2.6 present.
AGP: No AGP bridge found
e820: last_pfn = 0x40000 max_arch_pfn = 0x400000000
x86 PAT enabled: cpu 0, old 0x7040600070406, new 0x7010600070106
CPU MTRRs all blank - virtualized system.

[...]

 (?- 
 //\   Core is distributed with ABSOLUTELY NO WARRANTY.
 v_/_           www.tinycorelinux.com

tc@box:~$

To shut down the VM and exit to the Mac’s command line, enter:

$ sudo halt

Running Ubuntu on xhyve

You can also install a more complete Linux distribution on xhyve. The tricky bit is that xhyve doesn’t come with a BIOS or EFI booter, so it is necessary to extract the kernel and initrd from the Linux image and pass them to xhyve manually.

First download Ubuntu Server (the desktop version doesn’t support the text mode installer) into the directory “ubuntu” inside the “xhyve” directory:

$ ls -l
total 1218560
-rw-r--r--@ 1 mist  staff  623902720  6 Jun 22:14 ubuntu-14.04.2-server-amd64.iso

We need to extract the kernel and initrd, which is a little tricky, because OS X doesn’t recognize the hybrid file system on the image without a little hack:

$ dd if=/dev/zero bs=2k count=1 of=/tmp/tmp.iso
$ dd if=ubuntu-14.04.2-server-amd64.iso bs=2k skip=1 >> /tmp/tmp.iso
$ hdiutil attach /tmp/tmp.iso
$ cp /Volumes/Ubuntu-Server\ 14/install/vmlinuz .
$ cp /Volumes/Ubuntu-Server\ 14/install/initrd.gz .

Create a virtual hard disk image (8 GB in the example):

$ dd if=/dev/zero of=hdd.img bs=1g count=8

Then create a script to run xhyve with the correct arguments for the installer:

#!/bin/sh

KERNEL="ubuntu/vmlinuz"
INITRD="ubuntu/initrd.gz"
CMDLINE="earlyprintk=serial console=ttyS0 acpi=off"

MEM="-m 1G"
#SMP="-c 2"
NET="-s 2:0,virtio-net"
IMG_CD="-s 3,ahci-cd,ubuntu/ubuntu-14.04.2-server-amd64.iso"
IMG_HDD="-s 4,virtio-blk,ubuntu/hdd.img"
PCI_DEV="-s 0:0,hostbridge -s 31,lpc"
LPC_DEV="-l com1,stdio"

build/xhyve $MEM $SMP $PCI_DEV $LPC_DEV $NET $IMG_CD $IMG_HDD -f kexec,$KERNEL,$INITRD,"$CMDLINE"

You will want networking enabled, so it’s easiest to run the script as root (this requirement is lifted if you codesign the binary):

$ sudo ./xhyverun_ubuntu_install.sh

You will see the Ubuntu text mode installer:

  ┌───────────────────────┤ [!!] Select a language ├────────────────────────┐
  │                                                                         │
  │ Choose the language to be used for the installation process. The        │
  │ selected language will also be the default language for the installed   │
  │ system.                                                                 │
  │                                                                         │
  │ Language:                                                               │
  │                                                                         │
  │                               C                                         │
  │                               English                                   │
  │                                                                         │
  │     <Go Back>                                                           │
  │                                                                         │
  └─────────────────────────────────────────────────────────────────────────┘

<Tab> moves; <Space> selects; <Enter> activates buttons

All answers should be straightforward, and the defaults are usually fine. Make sure to select “Yes” when asked “Install the GRUB boot loader to the master boot record”.

At the very end, on the “Installation complete” screen, select “Go back” and “Execute a shell”, so you can copy the installed kernel and initrd to the Mac side. In the VM, type this:

# cd /target
# sbin/ifconfig
# tar c boot | nc -l -p 1234

On the Mac, type this, replacing the IP with the output from ifconfig before:

$ cd ubuntu
$ nc 192.168.64.7 1234 | tar x

In the VM, exit the shell:

# exit

Then select “Finish the installation”.

To run the Ubuntu installation from the virtual hard disk, create the following script, fixing up the kernel and initrd version numbers:

#!/bin/sh

KERNEL="ubuntu/boot/vmlinuz-3.16.0-30-generic"
INITRD="ubuntu/boot/initrd.img-3.16.0-30-generic"
CMDLINE="earlyprintk=serial console=ttyS0 acpi=off root=/dev/vda1 ro"

MEM="-m 1G"
#SMP="-c 2"
NET="-s 2:0,virtio-net"
IMG_HDD="-s 4,virtio-blk,ubuntu/hdd.img"
PCI_DEV="-s 0:0,hostbridge -s 31,lpc"
LPC_DEV="-l com1,stdio"

build/xhyve $MEM $SMP $PCI_DEV $LPC_DEV $NET $IMG_CD $IMG_HDD -f kexec,$KERNEL,$INITRD,"$CMDLINE"

Then run the script:

$ sudo ./xhyverun_ubuntu.sh

To make your Linux installation useful, you may want to install an SSH server:

$ sudo apt-get install openssh-server

Or install a full UI that you can access using VNC:

$ sudo apt-get install xubuntu-desktop vnc4server

Then run the VNC server:

$ vnc4server :0 -geometry 1024x768

And conntect to it by pasting this into Finder’s Cmd+K “Connect to Server” dialog:

vnc://ubuntu.local

If you also follow skerit’s Ubuntu VNC tutorial, you’ll get to a desktop like this.

Next Steps

xhyve is very basic and lightweight, but it has a lot of potential. If you are a developer, you are welcome to contribute to it. A list of current TODOs and ideas is part of the README file in the repository.

Reconstructing Some Source of Microsoft BASIC for 8080

Microsoft BASIC for 6502 exists digitally in source form – the older version of the Intel 8080 CPU only exists on paper though: as a printout in the archives of Harvard University. Some snippets of the code are public though:

  • Ian Griffiths held the printout in his hands and took notes. He copied down several lines from the first page.
  • In Harry Lewis’s blog post, he shows a picture of some lines of the reproduction that is display on the the wall of the ground floor lounge of the Maxwell Dworkin building at Harvard.
  • David J. Malan has a collection of photos of the Maxwell Dworkin reproductions online.

The Computer History Museum in Mountain View, California has a video on display in the software section that tells the story of the company Microsoft. In this video, they show the first page of 8080 BASIC:


Together with the other two sources, we can reconstruct the start of the first page:

00100	MCSSIM(STARI)
00120
00140	TITLE	BASIC MCS 8080	GATES/ALLEN/DAVIDOFF
00160	IFNDEF	LENGTH,<PRINTX !!! MUST HAVE COM !!
00180		END>
00200	IF1;<
00220	IFE	LENGTH,<PRINTX /SMALL/ >
00240	IFE	LENGTH-1;<PRINTX /MEDIUM/ >
00260	IFE	LENGTH-2;<PRINTX /BIG/ >
00280	IFE	STRING,<PRINTX /NO $$/ >
00300	IFN	STRING,<PRINTX /$$ $$/ >
00320	>
00340	SUBTTL	VERSION 1.1 -- MORE FEATURES TO COME
00360	COMMENT *
00380
00400	-------------------------------------------
00420	COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
00440	-------------------------------------------
00460
00480
00500	WRITTEN ORIGINALLY ON THE PDP-10 AT HARVARD FROM
00520	FEBRUARY 9 TO APRIL 27
00540
00560	PAUL ALLEN WROTE THE NON-RUNTIME STUFF.
00580	BILL GATES WROTE THE RUNTIME STUFF.
00600	MONTE DAVIDOFF WROTE THE MATH PACKAGE.
00620
00640	THINGS TO DO:
00641	SYNTAX PROBLEMS (OR)
00642	NICE ERRORS
00643	ALLOW ^W AND ^C IN LIST COMMAND
00646	TAPE I/O
00648	BUFFER I/O
00650	USR ??
00652	ELSE
00660	USER-DEFINED FUNCTIONS(MULTI-ARG,MULTI-LINE,STRINGS)
00680	MAKE STACK BOUNDARY STUFF EXACT
	(FOUT 24 FIN 14)
	PUNCH, DELETE;,.
	INLINE CONSTANT CONVERSION -- MAKE IT WORK
	SIMPLE STRINGS

While this is nice, it would be much nicer to have more, maybe all of the original source.

Can someone take high resolution photos of the first eight pages on display on the the wall of the ground floor lounge of the Maxwell Dworkin building at Harvard?

Can someone make a copy of the printout at the Harvard University archives? Applications like Microsoft Office Lens can make high quality copies of printouts with a phone.

Making Obsolete Code Run Again: The mxass 6502 Cross Assembler

Here’s the challenge: Take code that you wrote some 20 years ago in an obsolete programming language for an obsolete platform, make it run on a modern system (without emulation!)… and actually make it useful!

In 1995, I started developing a 6502 Cross Assembler for MS-DOS, in my then favorite languages: PowerBASIC for the bulk of it, and lots of 8086 inline assembly to speed up string operations. I mostly used it for my own C64 projects, and I was very proud of its speed: A fraction of a second to produce several KB of binary code on a 386.

In September 1996, I decided that Turbo Pascal was actually the better language, and converted the source line by line, but keeping all the inline assembly. Development continued until June 1998.

In January 2008, I rediscovered the source and wanted to see whether it could be ported to run on modern computers. I used p2c to convert the Pascal source into C, and spend two days cleaning up the C and rewriting the assembly in C until it correctly compiled my regression test – but the code was still using Pascal strings, for example.

Recently, I dug into my floppy disk collection to recover as many revisions of the source as possible, converted the whole (surviving) history into a git repository, and put it on

github.com/mist64/mxass.

The C version of mxass should run on any modern operating system, and it’s actually a useful piece of software, with some unique features. It supports 6502 with illegal opcodes, 65816 and Z80 assembly, and it tries to be backwards compatible with the “64′er” set of assemblers (Hypra-Ass, Giga-Ass, Vis-Ass, Assblaster, F8-Assblaster), in fact, F8-Assblaster source printed to a file in an emulator should assemble with very few changes.

That said, please do not use it for new projects. Use cc65 for that.

What is the oldest source you have written for an obsolete platform that have ported forward to modern systems?

Emulating the Intel 8080 on a MOS 6502

Emulating older computers on modern, much faster systems, is very common nowadays – but how about emulating the Intel 8080 (1974) on a MOS 6502 system like the KIM-1 (1975)? The “8080 Simulator for the 6502″ by Dann McCreary from 1978 does exactly that.

Why imitate one microprocessor with another? You probably purchased this 8080 simulator package to do one or more of the following:

  • Run existing 8080 software on your 6502
  • Write, test and debug your own 8080 software without having to purchase a complete 8080 based system
  • Learn something about the architecture and instruction set of the 8080 via hands-on experience

The emulator is extremely size-optimized and fits in less than 1 KB of RAM. This was done by compressing the 256-entry opcode space into 25 sections of similar instructions that could be handled by one generic function.

The four-page article “8080 Simulation with a 6502″ (MICRO – The 6502 Journal, issue 16, September 1979) explains the motivation and design of the software in detail:


Dann McCreary: 8080 Simulation with a 6502 [1979]

And here is the original commented source code with usage instructions:


Dann McCreary: An 8080 Simulator for the 6502, KIM-1 Version [1978]

Thanks a lot to Dann McCreary, who provided scans of his original work, as well as additional insights:

I wrote this by hand, pencil and paper assembly (BTW, did you ever read Carl Helmer’s article about pencil and paper in one of the very early issues of BYTE magazine? ;) and much of the simulator was written as I rode the bus to and from work… ;)

About the tools used to create this program:

Honestly, I don’t remember for certain… BUT… MORE THAN LIKELY, it is
a FAKE – i.e., I probably just text-edited a listing in “assembler
listing” format for the purpose of publishing the code and “looking
professional”… ;)

CONSEQUENTLY, be on the lookout for typographical (and thus operational)
errors that could be “in there” …

The all-upper-case makes me think I printed this stuff out on an old
drum printer that I resurrected from the American Surplus Computer
company in Boston back in the ’70s… And there’s yet another story! :)

The text editor may likely have been running on my Apple ][…. or maybe
“borrowed” from some company where I was working at the time?

Apple-80

Dann later ported the simulator to the Apple-][. The tape dump of the resulting product “Apple-80″ can be found at brutaldeluxe.fr.

Converting VisAss and F8 AssBlaster Source

If you have developed applications for the Commodore 64 in the 80s or 90s, chances are you still have your old floppy disk with the original assembly sources. If you have used the VisAss or
F8 AssBlaster assemblers, you can use a new command line tool I wrote to convert the encoded binary files into ASCII, so they can be published or you can continue development using modern tools like cc65.

Get it at github.com/mist64/vis2ascii.

If you need any help converting old C64 source in other formats, contact me, I’ll be happy to help!