Category Archives: archeology

62 Reverse-Engineered C64 Assembly Listings

Between 1992 and 1995, I reverse engineered Commodore 64 applications by printing their disassemblies on paper and adding handwritten comments (in German). These are the PDF scans of the 62 applications, which are 552 pages total.

File Author Date Description
Adv. Squeezer 1995-08-19 Decompression code
Amica Paint Schnellader Fast loader Amica Paint extension
Bef.-Erw. Diethelm Berens BASIC extension
Bitmap Manager Hannes Sommer 1994-10-26 AGSP map editor
Bitstreamer 1994-11-06 Decompression code
Blanking Screen blanker
Creatures 2 Autostart John Rowlands 1993-05-03
Creatures 2 Loader John Rowlands 1993-05-04
Delta Coder Nikolaus Heusler 1994-08-25
Drive Composer Chester B. Kollschen 1994-05-25 1541 drive head sample player, Magic Disk 07/1993
Drive Digi Player Michael Steil 1994-05-25 1541 drive head sample player
Drive ROMs 1995-04-19 Differences of the 1541/-II/-C/1571/1581 ROMs
EmBa Nikolaus Heusler 1992-08-08 Emergency BASIC
Errorline-Lister 1992-08-18 BASIC extension
Fast Disk Set
Fast Load Frank Deizner 1994-09-29
Fast Save Explorer/Agony 1995-08-24
File Copier 1993-07-22
Final Cartridge III Freezer Loader 1994-05-01 Fast load/decompression code of the FC3 unfreeze binary
Fred’s Back II Hannes Sommer 1994-12 Parts of the AGSP game
GEO-RAM-Test Mario Büchle 1996-05-21 GeoRAM detection code
GEOS Select Printer 1993-01-25 GEOS tool
GoDot A. Dettke, W. Kling 1994-05-17 Core code, ldr.koala, svr.doodle
Graphtool 1992
Heureka-Sprint 1994-01 Fast loader
IRQ Loader 2 Bit Explorer/Agony 1995-08-24 Fast loader
Kunst aus China Autostart 1993-03-31
MIST Entpacker Michael Steil 1995-08-21 Decompression code
MIST Load V2 Michael Steil Fast load
MSE V1.1 1992-08-08 Programm type-in helper from 64′er Magazin
MSE V2.1 1993-03-28 Programm type-in helper from 64′er Magazin
MTOOL Michail Popov 1992-08-14
Mad Code VSP 1994-10-21 VSP + DYSP code from the demo “Mad Code”
Mad Format 1995-04-20 Fast format
Magic Basic BASIC extension
Magic Disk Autostart 1992-07
Master Copy 1995-09-21 Disk copy
Master Cruncher 1994-05-25 Decompression code
Mayhem in Monsterland VSP John Rowlands 1994-12 VSP code from the game “Mayhem in Monsterland”
Mini-Erweiterung Marc Freese 1992-08-09
Mini-Scan The Sir 1993-07-22 Disk error scanner
Mini-Tris Tetris
Movie-Scroller Ivo Herzeg 1995-01
RAM-Check T. Pohl 1995-09-08 REU detection
SUCK Copy 1995-04-30 File copy
SUPRA 64 Loader
Shapes 64 R. Löwenstein 1994-09-05 Text mode windowing
Small Ass S. Berghofer 1993-08-07 Assembler
Spriterob Andreas Breuer 1992
Startprg.obj M. Hecht 1992-06
SuperinfoIRQ Nikolaus Heusler 1992-06
Swappers Best Copy Disk copy
TPP’s Screensplitter Armin Beck 1995-03
The Sampler 1994-04-30 $D418 sample player (high volume on 8580!)
Tiny Compiler Nikolaus Heusler 1994-09-06 BASIC compiler
Turrican 1 Autostart Manfred Trenz 1994-11-02
Turrican 1 Loader Manfred Trenz 1994
Turrican 2 Autostart Manfred Trenz 1994-10-03
Vectorset M. Strelecki 1995-02-05
Vis-Ass Mazim Szenessy 1992-08 Assembler and screen editor
Vocabulary Schindowski 1993-01-05 FORTH library
Warp-Load 1994-10-02 Fast loader

80 Columns Text on the Commodore 64

The text screen of the Commodore 64 has a resolution of 40 by 25 characters, based on the hardware text mode of the VIC-II video chip. This is a step up from the VIC-20′s 22 characters per line, but since computers in the professional segment (Commodore PET 8000 series, CP/M, MS-DOS) usually had 80 columns, several solutions – both hardware and software – exist to allow 80 columns on a C64 as well. Let’s look at how this is done in software! At the end of this article, I present a fast and full-featured open source implementation with several different character sets.

Regular 40×25 Mode

First, we need to understand how 40 columns are done. The VIC-II video chip has dedicated support for a text mode. There is a 1000 (= 40 * 25) byte “Screen RAM”, each byte of which contains the character to be displayed at that location in the form of an index into the 2 KB character set, which contains 8 bytes (for 8×8 pixels) for each of the 256 characters. In addition, there is a “Color RAM”, which contains a 1000 (again 40 * 25) 4-bit values, which represents a color for each character on the screen.

Putting a character onto the screen is quite trivial: Just write the index of it to offset column + 40 * line into Screen RAM, and its color to the same offset in Color RAM. An application can load its own character set, but the Commodore 64 already comes with two character sets in ROM: One with uppercase characters and lots of graphical symbols (“GRAPHICS”), and one with upper and lower case (“TEXT”). You can switch these by pressing the Commodore and the Shift key at the same time.

Bitmap Mode

There is no hardware support for an 80 column mode, but such a mode can be implemented in software by using bitmap mode. In bitmap mode, all 320 by 200 pixels on the screen can be freely addressed through an 8000 byte bitmap, which contains one bit for every pixel on the screen. Luckily for us, the layout of the bitmap in memory is not linear, but reminds of the encoding of text mode: The first 8 bytes in Bitmap RAM don’t describe, as you would expect, the leftmost 64 pixels on the first line. Instead, they describe the top left 8×8 block. The next 8 bytes describe the 8×8 block to the right of it, and so on.

This is the same layout as the character set’s: The first 8 bytes correspond to the first character, the next 8 bytes to the second character and so on. Drawing an 8×8 character onto a bitmap (aligned to the 8×8 grid) is as easy as copying 8 consecutive bytes.

This is what an 8×8 font looks like in memory:

0000 ··████··  0008 ···██···  0010 ·█████··
0001 ·██··██·  0009 ··████··  0011 ·██··██·
0002 ·██·███·  000a ·██··██·  0012 ·██··██·
0003 ·██·███·  000b ·██████·  0013 ·█████··
0004 ·██·····  000c ·██··██·  0014 ·██··██·
0005 ·██···█·  000d ·██··██·  0015 ·██··██·
0006 ··████··  000e ·██··██·  0016 ·█████··
0007 ········  000f ········  0017 ········

For an 80 column screen, every character is 4×8 pixels. So we could describe the character set like this:

0000 ········  0008 ········  0010 ········
0001 ··█·····  0009 ··█·····  0011 ·██·····
0002 ·█·█····  000a ·█·█····  0012 ·█·█····
0003 ·███····  000b ·███····  0013 ·██·····
0004 ·███····  000c ·█·█····  0014 ·█·█····
0005 ·█······  000d ·█·█····  0015 ·█·█····
0006 ··██····  000e ·█·█····  0016 ·██·····
0007 ········  000f ········  0017 ········

Every 4×8 character on the screen is either in the left half or the right half of an 8×8 block, so drawing an 4×8 character is as easy as copying the bit pattern into the 8×8 block – and shifting it 4 bits to the right for characters at odd positions.

Color

In bitmap mode, it is only possible to use two out of the 16 colors per 8×8 block, because there are only 1000 (40 * 25) entries for the color matrix. This is a problem, since we need three colors per 8×8: Two for the two characters and one for the background. We will have to compromise: Whenever a character gets drawn into an 8×8 block and the other character in the block has a different color, that other character will be changed to the same color as the new character.

Scrolling

Scrolling on a text screen is easy: 960 bytes of memory have to be copied to move the character indexes to their new location. In bitmap mode, 7680 bytes have to be copied – 8 times more. Even with the most optimized implementation (73ms, about 3.5 frames), scrolling will be slower, and tearing artifacts are unavoidable.

Character Set

Creating a 4×8 character set that is both readable and looks good is not easy. There has to be a one-pixel gap between characters, so characters can effectively only be 3 pixels wide. For characters like “M” and “N”, this is a challenge.

These are the character sets of four different software solutions for 80 columns:

COLOR 80 by Richvale Telecommunications
  

80COLUMNS
  

SCREEN-80 by Compute’s Gazette
  

Highspeed80 by CKtwo
  

Some observations:

  • Highspeed80 and SCREEN-80 have capitals of a height of 7 pixels (more detail, but very tight vertical spacing, making it hard to read), while COLOR 80 uses only 5 pixels (more square like the original 8×8 font, but less detail). 6 pixels, as used by 80COLUMNS, seems like a good compromise.
  • 80COLUMNS has the empty column at the left, which makes single characters in reverse mode more readable, since most characters have their stems at the left.
  • Except for Highspeed80, the graphical symbols are very similar between the different character sets.
  • All four character sets use the same strategy to distinguish between “M” and “N”.

The Editor

The “EDITOR” is the part of C64′s ROM operating system (“KERNAL”) that handles printing characters to the screen (and interpreting control characters), as well as converting on-screen contents back into a PETSCII string – yes, text input on CBM computers is done by feeding the keyboard input directly into character output, and reading back the screen contents when the user presses the return key. This way, the user can use the cursor keys to navigate to existing text anywhere on the screen (even to the output of previous commands), edit it, and have the system interpret it as input when pressing return.

The C64′s EDITOR can only deal with 40 columns (but has a very nice feature that allows using two 40 character lines as one virtual 80 character input line), and has no idea how to draw into a bitmap, so a software 80 characters solution basically has to provide a complete reimplementation of this KERNAL component.

The KERNAL provides user vectors for lots of its functionality, so both character output, and reading back characters from the screen can be hooked (vectors IBSOUT at $0326 and IBASIN at $0324). In addition to drawing characters into the bitmap, the character codes have to be cached in a 80×25 virtual Screen RAM, so the input routine can read them back.

The PETSCII code contains control codes for changing the current color, moving the cursor, clearing the screen, turning reverse on and off, and switching between the “GRAPHICS” and “TEXT” character sets. The new editor has provide code to interpret these. There are two special cases though: When in quote mode (the user is typing text between quotes) or insert mode (the user has typed shift+delete), most special characters show up as inverted graphical characters instead of being interpreted. This way, control characters can be included e.g. in strings in BASIC programs.

There are two functions though that cannot be intercepted through vectors: Applications (and BASIC programs) change the screen background color by writing the color’s value to $d020, since there is no KERNAL function or BASIC command for it, and the KERNAL itself switches between the two character sets (when the user presses the Commodore and the Shift key at the same time) by directly writing to $d018. The only way to intercept these is by hooking the timer interrupt vector and detecting a change in these VIC-II registers. If the background color has changed, the whole 1000 byte color matrix for bitmap mode has to be updated, and if the character set has changed, the whole screen has to be redrawn.

The Implementation

I looked at all existing software implementations I could find and concluded that “80COLUMNS” (by an unknown author) had the best design and was the only one to implement the full feature set of the original EDITOR. I reverse-engineered it into structured, easy to read code, added Ilker Ficicilar’s fast scrolling patch as well as my own minor cleanups, fixes and optimizations.

https://www.github.com/mist64/80columns

The project requies cc65 and exomizer to build. Running make will produce 80columns-compressed.prg, which is about 2.2 KB in size and can be started using LOAD/RUN.

The source contains several character sets (charset.s, charset2.s etc.) from different 80 column software solutions, which can be selected by changing the reference to the filename in the Makefile.

The object code resides at $c800-$cfff. The two character sets are located at $d000-$d7ff. The virtual 80×25 Screen RAM (in order to read back screen contents) is at $c000-$c7ff. The bitmap is at $e000-$ff40, and the color matrix for bitmap mode is at $d800-$dbe8. All this lies beyond the top of BASIC RAM, so BASIC continues to have 38911 bytes free.

In order to speed up drawing, the character set contains all characters duplicated like this:

0000 ········  0008 ········  0010 ········
0001 ··█···█·  0009 ··█···█·  0011 ·██··██·
0002 ·█·█·█·█  000a ·█·█·█·█  0012 ·█·█·█·█
0003 ·███·███  000b ·███·███  0013 ·██··██·
0004 ·███·███  000c ·█·█·█·█  0014 ·█·█·█·█
0005 ·█···█··  000d ·█·█·█·█  0015 ·█·█·█·█
0006 ··██··██  000e ·█·█·█·█  0016 ·██··██·
0007 ········  000f ········  0017 ········

This way, the drawing code only has to mask the value instead of shifting it. In addition, parts of character drawing and all of screen scrolling are using unrolled loops for performance.

Contributions to the project are very welcome. It would be especially interesting to add new character sets, both existing 4×8 fonts from other projects (including hinted TrueType fonts!), and new ones that combine the respective strengths of the existing ones.

80×33 Mode?

Reducing the size of characters to 4×6 would allow a text mode resolution of 80×33 characters. Novaterm 10 has an implementation. At this resolution, logical characters don’t end at vertical 8×8 boundaries any more, making color impossible, and the drawing routine a little slower. It would be interesting to add an 80×33 mode as a compile time option to “80columns”.

Building the Original Commodore 64 KERNAL Source

Russian Translation by HTR

Many reverse-engineered versions of “KERNAL”, the C64′s ROM operating system exist, and some of them even in a form that can be built into the original binary. But how about building the original C64 KERNAL source with the original tools?

The Commodore engineer Dennis Jarvis rescued many disks with Commodore source, which he gave to Steve Gray for preservation. One of these disks, c64kernal.d64, contains the complete source of the original KERNAL version (901227-01), including all build tools. Let’s build it!

Building KERNAL

First, we need a Commodore PET. Any PET will do, as long as it has enough RAM. 32 KB should be fine. For a disk drive, a 2040 or 4040 is recommended if you want to create the cross-reference database (more RAM needed for many open files), otherwise a 2031 will do just fine. The VICE emulator will allow you to configure such a system.

First you should delete the existing cross-reference files on disk. With the c1541 command line tool that comes with VICE, this can be done like this:

c1541 c64kernal.d64 -delete xr* -validate

Now attach the disk image and either have the emulator autostart it, or

LOAD"ASX4.0",8
RUN

This will run the “PET RESIDENT ASSEMBLER”. Now answer the questions like this:

PET RESIDENT ASSEMBLER V102780
(C) 1980 BY COMMODORE BUSINESS MACHINES

OBJECT FILE (CR OR D:NAME): KERNAL.HEX
HARD COPY (CR/Y OR N)? N
CROSS REFERENCE (CR/NO OR Y)? N
SOURCE FILE NAME? KERNAL

The assembler will now do its work and create the file “KERNAL.HEX” on disk. This will take about 16 minutes, so you probably want to switch your emulator into “Warp Mode”. When it is done, quit the emulator to make sure the contents of the disk image are flushed.

Using c1541, we can extract the output of the assembler:

c1541 c64kernal.d64 -read kernal.hex

The file is in MOS Technology Hex format and can be converted using srecord:

tr '\r' '\n' < kernal.hex > kernal_lf.hex
srec_cat kernal_lf.hex -MOS_Technologies \
-offset -0xe000 \
-fill 0xaa 0x0000 0x1fff \
-o kernal.bin -Binary

This command makes sure to fill the gaps with a value of 0xAA like the original KERNAL ROM.

The resulting file is identical with 901227-01 except for the following:

  • $E000-$E4AB is empty instead of containing the overflow BASIC code
  • $E4AC does not contain the checksum (the BASIC program “chksum” on the disk is used to calculate it)
  • $FFF6 does not contain the “RRBY” signature

Also note that $FF80, the KERNAL version byte is $AA – the first version of KERNAL did not yet use this location for the version byte, so the effective version byte is the fill byte.

Browsing the Source

An ASCII/LF version of the source code is available at https://github.com/mist64/cbmsrc.

;****************************************
;*                                      *
;* KK  K EEEEE RRRR  NN  N  AAA  LL     *
;* KK KK EE    RR  R NNN N AA  A LL     *
;* KKK   EE    RR  R NNN N AA  A LL     *
;* KKK   EEEE  RRRR  NNNNN AAAAA LL     *
;* KK K  EE    RR  R NN NN AA  A LL     *
;* KK KK EE    RR  R NN NN AA  A LL     *
;* KK KK EEEEE RR  R NN NN AA  A LLLLL  *
;*                                      *
;***************************************
;
;***************************************
;* PET KERNAL                          *
;*   MEMORY AND I/O DEPENDENT ROUTINES *
;* DRIVING THE HARDWARE OF THE         *
;* FOLLOWING CBM MODELS:               *
;*   COMMODORE 64 OR MODIFED VIC-40    *
;* COPYRIGHT (C) 1982 BY               *
;* COMMODORE BUSINESS MACHINES (CBM)   *
;***************************************
.SKI 3
;****LISTING DATE --1200 14 MAY 1982****
.SKI 3
;***************************************
;* THIS SOFTWARE IS FURNISHED FOR USE  *
;* USE IN THE VIC OR COMMODORE COMPUTER*
;* SERIES ONLY.                        *
;*                                     *
;* COPIES THEREOF MAY NOT BE PROVIDED  *
;* OR MADE AVAILABLE FOR USE ON ANY    *
;* OTHER SYSTEM.                       *
;*                                     *
;* THE INFORMATION IN THIS DOCUMENT IS *
;* SUBJECT TO CHANGE WITHOUT NOTICE.   *
;*                                     *
;* NO RESPONSIBILITY IS ASSUMED FOR    *
;* RELIABILITY OF THIS SOFTWARE. RSR   *
;*                                     *
;***************************************
.END

Announcement: “The Ultimate Game Boy Talk” at 33C3

I will present “The Ultimate Game Boy Talk” at the 33rd Chaos Communication Congress in Hamburg later in December.

If you are interested in attending the talk, please go to https://halfnarp.events.ccc.de/, select it and press submit, so the organizers can reserve a big enough room.

The talk continues the spirit of The Ultimate Commodore 64 Talk, which I presented at the same conference eight years ago, as well as several other talks in the series done by others: Atari 2600 (Svolli), Galaksija (Tomaž Šolc), Amiga 500 (rahra).

Here’s the abstract:

The 8-bit Game Boy was sold between 1989 and 2003, but its architecture more closely resembles machines from the early 1980s, like the Commodore 64 or the NES. This talk attempts to communicate “everything about the Game Boy” to the listener, including its internals and quirks, as well as the tricks that have been used by games and modern demos, reviving once more the spirit of times when programmers counted clock cycles and hardware limitations were seen as a challenge.

Reverse-Engineered GEOS 2.0 for C64 Source Code

The GEOS operating system managed to clone the Macintosh GUI on the Commodore 64, a computer with an 8 bit CPU and 64 KB of RAM. Based on Maciej Witkowiak's work, I created a reverse-engineered source version of the C64 GEOS 2.0 KERNAL for the cc65 compiler suite:

https://github.com/mist64/geos

  • The source compiles into the exact same binary as shipped with GEOS 2.0.
  • The source is well-structured and split up into 31 source files.
  • Machine-specific code is marked up.
  • Copy protection/trap mechanisms can be disabled.
  • The build system makes sure binary layout requirements are met.

This makes the source a great starting point for

  • adding (optional) optimized code paths or features
  • integrating existing patches from various sources
  • integrating versions for other computers
  • porting it to different 6502-based computers

Just fork the project and send pull requests!

Copy Protection Traps in GEOS for C64

Major GEOS applications on the Commodore 64 protect themselves from unauthorized duplication by keying themselves to the operating system's serial number. To avoid tampering with this mechanism, the system contains some elaborate traps, which will be discussed in this article.

GEOS Copy Protection

The GEOS boot disk protects itself with a complex copy protection scheme, which uses code uploaded to the disk drive to verify the authenticity of the boot disk. Berkeley Softworks, the creators of GEOS, found it necessary to also protect their applications like geoCalc and geoPublish from unauthorized duplication. Since these applications were running inside the GEOS "KERNAL" environment, which abstracted most hardware details away, these applications could not use the same kind of low-level tricks that the system was using to protect itself.

Serial Numbers for Protection

The solution was to use serial numbers. On the very first boot, the GEOS system created a 16 bit random number, the "serial number", and stored it in the KERNAL binary. (Since the system came with a "backup" boot disk, the system asked for that disk to be inserted, and stored the same serial in the backup's KERNAL.) Now whenever an application was run for the first time, it read the system's serial number and stored it in the application's binary. On subsequent runs, it read the system's serial number and compared it with the stored version. If the serial numbers didn't match, the application knew it was running on a different GEOS system than the first time – presumably as a copy on someone else's system: Since the boot disk could not be copied, two different people had to buy their own copies of GEOS, and different copies of GEOS had different serial numbers.

Serial Numbers in Practice

The code to verify the serial number usually looked something like this:

.,D5EF  20 D8 C1    JSR $C1D8 ; GetSerialNumber
.,D5F2  A5 03       LDA $03   ; read the hi byte
.,D5F4  CD 2F D8    CMP $D82F ; compare with stored version
.,D5F7  F0 03       BEQ $D5FC ; branch if equal
.,D5F9  EE 18 C2    INC $C218 ; sabotage LdDeskAcc syscall: increment vector
.,D5FC  A0 00       LDY #$00  ; ...

If the highest 8 bits of the serial don't match the value stored in the application's binary, it increments the pointer of the LdDeskAcc vector. This code was taken from the "DeskTop" file manager, which uses this subtle sabotage to make loading a "desk accessory" (a small helper program that can be run from within an application) unstable. Every time DeskTop gets loaded, the pointer gets incremented, and while LdDeskAcc might still work by coincidence the first few times (because it only skips a few instructions), it will break eventually. Other applications used different checks and sabotaged the system in different ways, but they all had in common that they called GetSerialNumber.

(DeskTop came with every GEOS system and didn't need any extra copy protection, but it checked the serial anyway to prevent users from permanantly changing their GEOS serial to match one specific pirated application.)

A Potential Generic Hack

The downside of this scheme is that all applications are protected the same way, and a single hack could potentially circumvent the protection of all applications.

A generic hack would change the system's GetSerialNumber implementation to return exactly the serial number expected by the application by reading the saved value from the application's binary. The address where the saved valus is stored is different for every application, so the hack could either analyze the instructions after the GetSerialNumber call to detect the address, or come with a small table that knows these addresses for all major applications.

GEOS supports auto-execute applications (file type $0E) that will be executed right after boot – this would be the perfect way to make this hack available at startup without patching the (encrypted) system files.

Trap 1: Preventing Changing the Vector

Such a hack would change the GetSerialNumber vector in the system call jump table to point to new code in some previously unused memory. But the GEOS KERNAL has some code to counter this:

                                ; (Y = $FF from the code before)
.,EE59  B9 98 C0    LDA $C098,Y ; read lo byte of GetSerialNumber vector
.,EE5C  18          CLC
.,EE5D  69 5A       ADC #$5A    ; add $5A
.,EE5F  99 38 C0    STA $C038,Y ; overwrite low byte GraphicsString vector

In the middle of code that deals with the menu bar and menus, it uses this obfuscated code to sabotage the GraphicsString system call if the GetSerialNumber vector was changed. If the GetSerialNumber vector is unchanged, these instructions are effectively a no-op: The lo byte of the system's GetSerialNumber vector ($F3) plus $5A equals the lo byte of the GraphicsString vector ($4D). But if the GetSerialNumber vector was changed, then GraphicsString will point to a random location and probably crash.

Berkely Softworks was cross-developing GEOS on UNIX machines with a toolchain that supported complex expressions, so they probably used code like this to express this:

    ; Y = $FF
    lda GetSerialNumber + 1 - $FF,y
    clc
    adc #<(_GraphicsString - _GetSerialNumber)
    sta GraphicsString + 1 - $FF,y

In fact, different variations of GEOS (like the GeoRAM version) were separate builds with different build time arguments, so because of different memory layouts, they were using different ADC values here.

Note that the pointers to the GetSerialNumber and GraphicsString have been obfuscated, so that an attacker that has detected the trashed GraphicsString vector won't be able to find the sabotage code by looking for the address.

Trap 2: Preventing Changing the Implementation

If the hack can't change the GetSerialNumber vector, it could put a JMP instruction at the beginning of the implementation to the new code. But the GEOS KERNAL counters this as well. The GetSerialNumber implementation looks like this:

.,CFF3  AD A7 9E    LDA $9EA7 ; load lo byte of serial
.,CFF6  85 02       STA $02   ; into return value (lo)
.,CFF8  AD A8 9E    LDA $9EA8 ; load hi byte of serial
.,CFFB  85 03       STA $03   ; into return value (hi)
.,CFFD  60          RTS       ; return

At the end of the system call function UseSystemFont, it does this:

.,E6C9  AD 2F D8    LDA $D82F ; read copy of hi byte of serial
.,E6CC  D0 06       BNE $E6D4 ; non-zero? done this before already
.,E6CE  20 F8 CF    JSR $CFF8 ; call second half of GetSerialNumber
.,E6D1  8D 2F D8    STA $D82F ; and store the hi byte in our copy
.,E6D4  60          RTS       ; ...

And in the middle of the system call function FindFTypes, it does this:

.,D5EB  A2 C1       LDX #$C1
.,D5ED  A9 96       LDA #$96  ; public GetSerialNumber vector ($C196)
.,D5EF  20 D8 C1    JSR $C1D8 ; "CallRoutine": call indirectly (obfuscation)
.,D5F2  A5 03       LDA $03   ; read hi byte of serial
.,D5F4  CD 2F D8    CMP $D82F ; compare with copy
.,D5F7  F0 03       BEQ $D5FC ; if identical, skip next instruction
.,D5F9  EE 18 C2    INC $C218 ; sabotage LdDeskAcc by incrementing its vector
.,D5FC  A0 00       LDY #$00  ; ...

So UseSystemFont makes a copy of the hi byte of the serial, and FindFTypes compares the copy with the serial – so what's the protection? The trick is that one path goes through the proper GetSerialNumber vector, while the other one calls into the bottom half of the original implementation. If the hack overwrites the first instruction of the implementation (or managed to disable the first trap and changed the system call vector directly), calling though the vector will reach the hack, while calling into the middle of the original implementation will still reach the original code. If the hack returns a different value than the original code, this will sabotage the system in a subtle way, by incrementing the LdDeskAcc system call vector.

Note that this code calls a KERNAL system function that will call GetSerialNumber indirectly, so the function pointer is split into two 8 bit loads and can't be found by just searching for the constant. Since the code in UseSystemFont doesn't call GetSerialNumber either, an attacker won't find a call to that function anywhere inside KERNAL.

Summary

I don't know whether anyone has ever created a generic serial number hack for GEOS, but it would have been a major effort – in a time before emulators allowed for memory watch points. An attacker would have needed to suspect the memory corruption, compared memory dumps before and after, and then found the two places that change code. The LdDeskAcc sabotage would have been easy to find, because it encodes the address as a constant, but the GraphicsString sabotage would have been nearly impossible to find, because the trap neither uses the verbatim GraphicsString address nor GetSerialNumber function.

Usual effective hacks were much more low-tech and instead "un-keyed" applications, i.e. removed the cached serial number from their binaries to revert them into their original, out-of-the-box state.

How Amica Paint protected tampering with its credits

In mid-1990, the floppy disk of special issue 55 of the German Commodore 64 magazine "64'er" contained the "Amica Paint" graphics program – which was broken beyond usefulness. I'll describe what went wrong.

"Amica Paint" was devloped by Oliver Stiller and first published in 64'er special issue 27 in 1988, as a type-in program that filled 25 pages of the magazine.

Two years later, Amica Paint was published again in special issue 55, which this time came with a floppy disk. But this version was completely broken: Just drawing a simple line would cause severe glitches.

Alt text

64'er issue 9/1990 published an erratum to fix Amica Paint, which described three ways (BASIC script, asm monitor and disk monitor) to patch 7 bytes in one of the executable files:

--- a/a.paint c000.txt
+++ b/a.paint c000.txt
@@ -67,8 +67,8 @@
 00000420  a5 19 85 ef a5 1a 85 f0  4c 29 c4 20 11 c8 20 6c  ........L). .. l
 00000430  c8 08 20 ed c7 28 90 f6  60 01 38 00 20 20 41 4d  .. ..(..`.8.  AM
 00000440  49 43 41 20 50 41 49 4e  54 20 56 31 2e 34 20 20  ICA PAINT V1.4  
-00000450  01 38 03 20 20 20 4f 2e  53 54 49 4c 4c 45 52 20  .8.   O.STILLER 
-00000460  31 39 39 30 20 20 20 01  31 00 58 3d 30 30 30 20  1990   .1.X=000 
+00000450  01 38 03 42 59 20 4f 2e  53 54 49 4c 4c 45 52 20  .8.BY O.STILLER 
+00000460  31 39 38 36 2f 38 37 01  31 00 58 3d 30 30 30 20  1986/87.1.X=000 
 00000470  59 3d 30 30 30 20 20 20  20 20 20 20 20 20 00 01  Y=000         ..
 00000480  31 00 42 49 54 54 45 20  57 41 52 54 45 4e 20 2e  1.BITTE WARTEN .
 00000490  2e 2e 20 20 20 20 00 64  0a 01 00 53 43 48 57 41  ..    .d...SCHWA

This changes the credits message from "O.STILLER 1990" to "BY O.STILLER 1986/87" – which is the original message from the previous publication.

64'er magazine had published the exact same application without any updates, but binary patched the credits message from "1986/87" to "1990", and unfortunately for them, Amica Paint contained code to detect exactly this kind of tampering:

.,C5F5  A0 14       LDY #$14     ; check 20 bytes
.,C5F7  A9 00       LDA #$00     ; init checksum with 0
.,C5F9  18          CLC
.,C5FA  88          DEY
.,C5FB  79 51 C4    ADC $C451,Y  ; add character from message
.,C5FE  88          DEY
.,C5FF  18          CLC
.,C600  10 F9       BPL $C5FB    ; loop
.,C602  EE FD C5    INC $C5FD
.,C605  C9 ED       CMP #$ED     ; checksum should be $ED
.,C607  F0 05       BEQ $C60E
.,C609  A9 A9       LDA #$A9
.,C60B  8D E4 C7    STA $C7E4    ; otherwise sabotage line drawing
.,C60E  60          RTS

The code checksums the message "BY O.STILLER 1986/87". If the checksum does not match, the code will overwrite an instruction in the following code:

.,C7DC  65 EC       ADC $EC
.,C7DE  85 EC       STA $EC
.,C7E0  90 02       BCC $C7E4
.,C7E2  E6 ED       INC $ED
.,C7E4  A4 DD       LDY $DD
.,C7E6  60          RTS

The "LDY $DD" instruction at $C7E4 will be overwritten with "LDA #$DD", which will cause the glitches in line drawing.

The proper fix would have been to change the comparison with $ED into a comparison with $4F, the checksum of the updated message – a single byte fix. But instead of properly debugging the issue, 64'er magazine published a patch to restore the original message, practically admitting that they had cheated by implying the re-release was not the exact same software.

Macross 6502, an assembler for people who hate assembly language

There are many MOS 6502 cross-assemblers available. Here’s a new one. Or actually a very old one. “Macross”, a very powerful 6502 macro assembler, which was used to create Habitat, Maniac Mansion and Zak McKracken, was developed between 1984 and 1987 at Lucasfilm Ltd. and is now Open Source (MIT license):

https://github.com/Museum-of-Art-and-Digital-Entertainment/macross

Some History

Starting in 1984, a team at Lucasfilm Ltd. was developing one of the first online role-playing games, originally called “Microcosm”, which was released as “Habitat” in 1986 and later renamed to “Club Caribe”. The client ran on a Commodore 64, which conntected to the central server through the Quantum Link network.

The client software was developed on a 68K-based Sun workstation running the SunOS variant of Unix using cross-development tools developed by Chip Morningstar (who was also the Habitat lead): The “Macross” assembler and the “Slinky” linker. They were used on every 6502 (Atari 400/800, Commodore 64, and Apple II) game produced at Lucasfilm Games, from 1984 up until those machines ceased to be relevant to the games market*.

In 2014, The Museum of Art and Digital Entertainment got a hold of

  • the source of the original development tools (Macross/Slinky)
  • the source of the C64 client
  • the source of the server (written in PL/I)
  • lots of documentation and development logs

which originated from an archive created in 1987 in the context of the technology transfer to Fujitsu, which bought all Habitat assets.

Since Macross and Slinky were written for Unix, it was easy to get them compiling with modern compilers (K&R syntax notwithstanding) and running on a modern Unix system. At the time of writing, the code in the official repository has been fixed to compile with clang on OS X. Further fixes and cleanups are very welcome.

Compiling Macross

Enter “make” both in the toplevel directory and in the “slinky” directory, then copy “macross” and “slinky” into your path. There are man files in the “doc” directory that you may want to install as well.

Writing Code

The syntax of Macross source files is very different from modern 6502 cross assembler, and more similar to Commodore’s own “A65″ assembler. Here is a small “Hello World” for the C64:

define strout = 0xab1e

hello:
    lda #/text
    ldy #?text
    jmp strout

text:
    byte "HELLO WORLD!", 0

As you can see, hex values have to be specified in C notation (binary is prefixed with “0b”), and the operators to extract the low and high bytes of a 16 bit value are “/” and “?”, respectively.

Compile and link the source file like this:

macross -c -o hello.o hello.m
slinky -e -o hello.bin -n -m hello.sym -l 0xc000 hello.o
dd if=hello.bin bs=1 skip=2 count=2 of=hello.prg
dd if=hello.bin bs=1 skip=6 >> hello.prg

The “dd” lines convert Slinky’s output, which is a “standard a65-style object file” (which has a header of FF FF, followed by the start address, followed by the end address) into a C64 .PRG file that is only prefixed by the start address.

Here is a slightly more complex example:

define bsout = 0xffd2

hello:
    ldx #0
    do {
        lda x[text]
        cmp #'A'
        if (geq) {
            tay
            iny
            tya
        }
        inx
        jsr bsout
    } while (!zero)
    
    rts

text:
    byte "HELLO WORLD!", 0

Macross supports C-style if/else, while and do/while, as well as do/until, where the condition can be one of:

  • zero/equal
  • carry
  • lt/leq/gt/geq
  • slt/sleq/sgt/sgeq
  • positive/plus/negative/minus
  • overflow

…as well as their negated versions.

Also note that the “absolute, x-indexed” addressing mode has a different syntax than commonly used.

Macros

Macross has a very powerfull macro language. Here is an example:

org 0xc000

function makeFirstByte(operand) {
    mif (isImmediateMode(operand)) {
        freturn(/operand)
    } melse {
        freturn(operand)
    }
}

function makeSecondByte(operand) {
    mif (isImmediateMode(operand)) {
        freturn(?operand)
    } melse {
        freturn(operand + 1)
    }
}

macro movew src, dst {
    lda makeFirstByte(src) 
    sta makeFirstByte(dst)
    lda makeSecondByte(src)
    sta makeSecondByte(dst)
}

macro hook_vector index, new, dst {
    ldx #index * 2
    movew x[0x0300], dst
    movew #new, x[0x0300]
}

define VECTOR_INDEX_IRQ = 10

    hook_vector VECTOR_INDEX_IRQ, irq, return + 1
    rts

irq:
    inc 0xd020
return:
    jmp 0xffff

The “hook_vector” line will emit the following assembly code:

    ldx #$14
    lda $0300,x
    sta $C01D
    lda $0301,x
    sta $C01E
    lda #$19
    sta $0300,x
    lda #$C0
    sta $0301,x

(The example is a little contrived, since adding the index could have been done at assembly time, but the example nicely demonstrates that macros can preserve addressing modes.)

The file doc/macros.itr contains many useful macros. they are documented in doc/genmacros.itr.

Full Documentation

The complete documentation of Macross is available in the file doc/writeup_6502.itr in the repository. It is in troff format and can viewed like this:

nroff -ms doc/writeup_6502.itr

Future

Macross is a very different approach to 6502 development, and with the source available, I think it’s a viable project that should be continued.

I will happily accept pull requests for compile fixes (GCC, VS, …), cleanups (C99, converting docs from troff to markdown, …) and features (BIN and PRG output, support for more a modern notation, PETSCII, …).

Reconstructing Some Source of Microsoft BASIC for 8080

Microsoft BASIC for 6502 exists digitally in source form – the older version of the Intel 8080 CPU only exists on paper though: as a printout in the archives of Harvard University. Some snippets of the code are public though:

  • Ian Griffiths held the printout in his hands and took notes. He copied down several lines from the first page.
  • In Harry Lewis’s blog post, he shows a picture of some lines of the reproduction that is display on the the wall of the ground floor lounge of the Maxwell Dworkin building at Harvard.
  • David J. Malan has a collection of photos of the Maxwell Dworkin reproductions online.

The Computer History Museum in Mountain View, California has a video on display in the software section that tells the story of the company Microsoft. In this video, they show the first page of 8080 BASIC:


Together with the other two sources, we can reconstruct the start of the first page:

00100	MCSSIM(STARI)
00120
00140	TITLE	BASIC MCS 8080	GATES/ALLEN/DAVIDOFF
00160	IFNDEF	LENGTH,<PRINTX !!! MUST HAVE COM !!
00180		END>
00200	IF1;<
00220	IFE	LENGTH,<PRINTX /SMALL/ >
00240	IFE	LENGTH-1;<PRINTX /MEDIUM/ >
00260	IFE	LENGTH-2;<PRINTX /BIG/ >
00280	IFE	STRING,<PRINTX /NO $$/ >
00300	IFN	STRING,<PRINTX /$$ $$/ >
00320	>
00340	SUBTTL	VERSION 1.1 -- MORE FEATURES TO COME
00360	COMMENT *
00380
00400	-------------------------------------------
00420	COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
00440	-------------------------------------------
00460
00480
00500	WRITTEN ORIGINALLY ON THE PDP-10 AT HARVARD FROM
00520	FEBRUARY 9 TO APRIL 27
00540
00560	PAUL ALLEN WROTE THE NON-RUNTIME STUFF.
00580	BILL GATES WROTE THE RUNTIME STUFF.
00600	MONTE DAVIDOFF WROTE THE MATH PACKAGE.
00620
00640	THINGS TO DO:
00641	SYNTAX PROBLEMS (OR)
00642	NICE ERRORS
00643	ALLOW ^W AND ^C IN LIST COMMAND
00646	TAPE I/O
00648	BUFFER I/O
00650	USR ??
00652	ELSE
00660	USER-DEFINED FUNCTIONS(MULTI-ARG,MULTI-LINE,STRINGS)
00680	MAKE STACK BOUNDARY STUFF EXACT
	(FOUT 24 FIN 14)
	PUNCH, DELETE;,.
	INLINE CONSTANT CONVERSION -- MAKE IT WORK
	SIMPLE STRINGS

While this is nice, it would be much nicer to have more, maybe all of the original source.

Can someone take high resolution photos of the first eight pages on display on the the wall of the ground floor lounge of the Maxwell Dworkin building at Harvard?

Can someone make a copy of the printout at the Harvard University archives? Applications like Microsoft Office Lens can make high quality copies of printouts with a phone.