Category Archives: trivia

The story of 15 Second Copy for the C-64

by Mike Pall, published with permission.

[This is a follow-up to Thomas Tempelmann’s Story of FCopy for the C-64.]

Ok, I have to make a confession … more than 25 years late:

I’ve reverse-engineered Thomas Tempelmann’s code, added various improvements and spread them around. I guess I’m at least partially responsible for the slew of fast-loaders, fast-copys etc. that circulated in the German C64 scene and beyond. Uh, oh …

I’ve only published AFLG (auto-fast-loader-generator) under my real name in the German “RUN” magazine. It owes quite a bit to TT’s original ideas. I guess I have to apologize to Thomas for not giving proper credit. But back then in the 80′s, intellectual property matters wasn’t exactly something a kid like me was overly concerned with.

Later on, everyone was soldering parallel-transfer cables to the VIA #1 of the 1541 and plugging them into the C64 userport. This provided extra bandwidth compared to the standard serial cable. It allowed much faster loading of programs with a tiny parallel loader (a file named “!”, that was prepended on all disks). Note that the commercial kits with cables, custom EPROMs and silly dongles followed only much later.

So I wrote “15 second copy”, which worked with a plain parallel cable. Yes, it copied a full 35 track disk in 15 seconds! There was only one down-side: this was only the time for reading/writing from and to disk — you had to swap the floppies seven times (!) and that usually took quite a bit more extra time! ;-)

It worked by transferring the “live” GCR-encoded data from the 1541′s disk head to the C64 and simultaneously doing a fast checksum. Part of the checksumming was done on the 1541, part was done on the C64. There simply weren’t enough cycles left on either side! Most of the transfer happened asynchronously by adjusting for the slightly different CPU frequencies and with only a minimum number of handshakes. This meant meticulous cycle counting and use of some odd tricks.

The raw GCR took up more space (684*324 bytes) in the C64 RAM, so that’s why it required 4 passes. Other copy programs fully decoded the GCR and required only 3 passes. But GCR decoding was rather time-consuming, so they had to skip some sectors and read every track multiple times. OTOH my program was able to read/write at the full 300rpm, i.e. 5 tracks per second plus stepper time, which boils down to 2x ~7.5 seconds for read and write. Yep, you had to swap the floppies every 2 seconds …

Ok, so I spread the program. For free. I even made a 40 track version, which took 17 seconds. Only to see these coming back in various mutations, with the original credits ripped out, decorated with multiple intros, different groups pretending they wrote it or cracked it (it was free, there was nothing to crack!). The only thing they left alone were the copy routines, probably because they were extremely fragile and hard to understand. So it was really easy to recognize my own code. Some of the commercial parallel-cable + ROM kits even bragged with “Backups in 15 seconds!”. These were blatant rip-offs: they basically changed the screen colors and added a check for their dongles. Duh.

Let’s just say this rather frustrating experience taught me a lot and that’s why I’m doing open source today.

So I shelved my plans to write an enhanced version which would try to compress the memory to reduce the number of passes. Ah, yes … I wrote quite a few packers, too … but I’ll save that story for another time.

I still have the disks with the source code somewhere in my basement. But I’m not so sure I’ll be able to read them anymore. They weren’t of high quality to begin with … and I’d have to find my homegrown toolchain, too. ;-)

But I took the time to reverse-engineer my own code from one of the copies that are floating around on the net. For better understanding on the C64/1541 handshake issues, refer to this article. If you’re wondering about the weird bvc * loops: the 6502 CPU of the 1541 has an SO pin, which is triggered by a full shift register for the data from the disk head. This directly sets the overflow flag in the CPU and allows reading the contents from the shift register with very low latency.

Yes, there’s a lot more weird code in there. For the sake of brevity, here are only the inner loops of the I/O routines for the read, write and verify pass for the C64 and the 1541 side. Enjoy!

  ;--- 1541: Read ---
  ldy #$20
f_read:
  bvc *        ; Wait for disk shift register to fill
  clv
  lda $1c01    ; Load data from disk
  sta $1801    ; Send byte to C64 via parallel cable
  inc $1800    ; Toggle serial pin
  eor $80      ; Compute checksum for 1st GCR byte in $80
  sta $80
  bvc *
  clv
  lda $1c01    ; Load data from disk
  sta $1801    ; Send byte to C64 via parallel cable
  dec $1800    ; Toggle serial pin
  eor $81      ; Compute checksum for 2nd GCR byte in $81
  sta $81
  ; ...
  ; Copy and checksum to $82 $83 $84
  ; And another time for $80 $81 $82 $83 $84 with inverted toggles
  ; ...
  dey
  beq f_read_end
  jmp f_read
f_read_end:
  ; Copy the remaining 4 bytes and checksum to $80 $81 $82
  ; Lots of bit-shifting and xoring to indirectly verify
  ; the sector checksum from the 5 byte xor of the raw GCR data

  ;--- C64: Read ---
  ; Setup ($5d) and ($5f) to point to GCR buffer
  ldy #$00
c_read:
  bit $dd00    ; Wait for serial pin to toggle
  bpl *-3
  lda $dd01    ; Read incoming data (from 1541)
  sta ($5d),y  ; Store to buffer
  iny
  bit $dd00    ; Wait for serial pin to toggle
  bmi *-3
  lda $dd01    ; Read incoming data (from 1541)
  sta ($5d),y  ; Store to buffer
  iny
  bne c_read
c_read2:
  bit $dd00    ; Wait for serial pin to toggle
  bpl *-3
  lda $dd01    ; Read incoming data (from 1541)
  sta ($5d),y  ; Store to buffer
  iny
  bit $dd00    ; Wait for serial pin to toggle
  bmi *-3
  lda $dd01    ; Read incoming data (from 1541)
  sta ($5d),y  ; Store to buffer
  iny
  cpy #$44
  bne c_read2

  ;--- C64: Write ---
  ; Setup ($5d) and ($5f) to point to GCR buffer
  ldy #$00
  tya
c_write:
  eor ($5d),y  ; Load from buffer and compute checksum
  bit $dd00    ; Wait for serial pin to toggle
  bpl *-3
  sta $dd01    ; Store xor'ed outgoing data (to 1541)
  iny
  eor ($5d),y  ; Load from buffer and compute checksum
  bit $dd00    ; Wait for serial pin to toggle
  bmi *-3
  sta $dd01    ; Store xor'ed outgoing data (to 1541)
  iny
  bne c_write
c_write2:
  eor ($5f),y  ; Load from buffer and compute checksum
  bit $dd00    ; Wait for serial pin to toggle
  bpl *-3
  sta $dd01    ; Store xor'ed outgoing data (to 1541)
  iny
  eor ($5f),y  ; Load from buffer and compute checksum
  bit $dd00    ; Wait for serial pin to toggle
  bmi *-3
  sta $dd01    ; Store xor'ed outgoing data (to 1541)
  iny
  cpy #$44
  bne c_write2
  ldx $5b
  sta $0200,x  ; Store checksum for verify pass
  inx
  stx $5b

  ;--- 1541: Write ---
  ldy #$a2
  lda #$00
f_write:
  bvc *        ; Wait for disk shift register to clear
  clv
  eor $1801    ; Xor with incoming data (from C64)
  sta $1c01    ; Write data to disk shift register
  dec $1800    ; Toggle serial pin
  lda $1801    ; Reload data to undo xor for next byte
  bvc *        ; Wait for disk shift register to clear
  clv
  eor $1801    ; Xor with incoming data (from C64)
  sta $1c01    ; Write data to disk shift register
  inc $1800    ; Toggle serial pin
  lda $1801    ; Reload data to undo xor for next byte
  dey
  bne f_write

  ;--- 1541: Verify ---
  ; Get checksum computed by c_write on the C64 side
  ldy #$a2
f_verify:
  bvc *        ; Wait for disk shift register to fill
  clv
  eor $1c01    ; Xor with data from disk
  bvc *        ; Wait for disk shift register to fill
  clv
  eor $1c01    ; Xor with data from disk
  dey
  bne f_verify
  ; Verify is ok if checksum is zero

The story of FCopy for the C-64

by Thomas Tempelmann, reprinted with permission.

Back in the 80s, the Commodore C-64 had an intelligent floppy drive, the 1541, i.e. an external unit that had its own CPU and everything.

The C-64 would send commands to the drive which in turn would then execute them on its own, reading files, and such, then send the data to the C-64, all over a propriatory serial cable.

The manual for the 1541 mentioned, besides the commands for reading and writing files, that one would read and write to its internal memory space. Even more exciting was that one could download 6502 code into the drive’s memory and have it executed there.

This got me hooked and I wanted to play with that – execute code on the drive. Of course, there was no documention on what code could be executed there, and which functions it could use.

A friend of mine had written a disassembler in BASIC, and so I read out all its ROM contents, which was 16KB of 6502 CPU code, and tried to understand what it does. The OS on the drive was quite amazing and advanced IMO – it had a kind of task management, with commands being sent from the communication unit to the disk I/O task handler.

I learned enough to understand how to use the disk I/O commands to read/write sectors of the disk. Actually, having read the Apple ][‘s DOS 3.3 book which explained all of the workings of its disk format and algos in much detail, was a big help in understanding it all.

(I later learned that I could have also found reverse-eng’d info on the more 4032/4016 disk drives for the “business” Commodore models which worked quite much the same as the 1541, but that was not available to me as a rather disconnected hobby programmer at that time.)

Most importantly, I also learnt how the serial comms worked. I realized that the serial comms, using 4 lines, two for data, two for handshake, was programmed very inefficiently, all in software (though done properly, using classic serial handshaking).

Thus I managed to write a much faster comms routine, where I made fixed timing assumptions, using both the data and the handshake line for data transmission.

Now I was able to read and write sectors, and also transmit data faster than ever before.

Of course, it would have been great if one could simply load some code into the drive which speeds up the comms, and then use the normal commands to read a file, which in turn would use the faster comms. This was not possible, though, as the OS on the drive did not provide any hooks for that (mind that all of the OS was in ROM, unmodifiable).

Hence I was wondering how I could turn my exciting findings into a useful application.

Having been a programmer for a while already, dealing with data loss all the time (music tapes and floppy disks were not very realiable back then), I thought: Backup!

So I wrote a backup program which could duplicate a floppy disk in never-before seen speed: The first version did copy an entire 170 KB disk in only 8 minutes (yes, minutes), the second version did it even in about 4.5 minutes. Whereas the apps before mine took over 25 minutes. (Mind you, the Apple ][, which had its disk OS running on the Apple directly, with fast parallel data access, did this all in a minute or so).

And so FCopy for the C-64 was born.

It became soon extremely popular. Not as a backup program as I had intended it, but as the primary choice for anyone wanting to copy games and other software for their friends.

Turned out that a simplification in my code, which would simply skip unreadable sectors, writing a sector with a bad CRC to the copy, did circumvent most of the then-used copy protection schemes, making it possible to copy most formerly uncopyable disks.

I had tried to sell my app and sold it actually 70 times. When it got advertised in the magazines, claiming it would copy a disk in less than 5 minutes, customers would call and not believe it, “knowing better” that it can’t be done, yet giving it a try.

Not much later, others started to reverse engineer my app, and optimize it, making the comms even faster, leading to copy apps that did it even in 1.5 minutes. Faster was hardly possible, because, due to the limited amount of memory available on the 1541 and the C-64, you had to swap disks several times in the single disk drive to copy all 170 KB of its contents.

In the end, FCopy and its optimized successors were probably the most-popular software ever on the C-64 in the 80s. And even though it didn’t pay off financially for me, it still made me proud, and I learned a lot about reverse-engineering, futility of copy protection and how stardom feels. (Actually, Jim Butterfield, an editor for a C-64 magazine in Canada, told its readers my story, and soon he had a cheque for about 1000 CA$ for me – collected by the magazine from many grateful users sending 5$-cheques, which was a big bunch of money back then for me.)

Chaosradio Express #177: Commodore 64

(This article is about a German-language podcast episode on the C64.)

Im Februar hat mich Tim Pritlove auf der Durchreise in Frankfurt abgefangen, wo ich mit einem Koffer voll mit zwölf Commodore 64 Motherboards in einem Hotelzimmer saß, und mit mir eine 2 Stunden und 42 Minuten lange Episode für Chaosradio Express aufgenommen.

Chaosradio Express #177: Commodore 64

Hier also nochmal der Hinweis auf die Folge, die jetzt schon ein paar Monate zurückliegt, für all diejenigen, die sie nicht schon anderweitig entdeckt haben. :-)

Racism in Monstropolis

Sometimes, freezeframe fun does not provide fun, but sadness.

In the Pixar movie Monsters Inc., you can see the following file of a child at 12 min 40 sec:

Monsters scare children at night, and this is how they keep track of them. The file comes with a blueprint of the room, a list of date stamps, business-critical notes like “scared of snakes”, and the standard data like name, gender, age, and… uh, race??

Albert Lozano, age 8, seems to be “hispanic”, and for monsters, this is apparently a feature that is important to them.

Oh well, that’s Monstropolis, a world inhabited by monsters that scare little children. Modern societies, on the other hand, have understood that “race” is a detail that is just as useful to track as shoe size. Oh wait.


SCUMM Script

Ron Gilbert posted this as a comment on The Mansion – Technical Aspects, I am re-posting it here as an article.

SCUMM script was a little odd. When I first started designing the language, I was going to base it on Lisp. I was use to using it to customize emacs that we used to do all our coding. This was not on PCs, but on large multiuser UNIX machines. The Lucasfilm games group was part of the Lucasfilm computer division (which later became Pixar) and we had some very smart people that connected the C64 to the UNIX machine. From my UNIX terminal, I had complete control over the C64 and wrote a source level debugger for SCUMM. The 6502 assembler was custom written by Chip Morningstar and ran on the UNIX machine.

But I digress. So, SCUMM started out as Lisp based, but I quickly abandon that approach in favor of something that looked a little more C-like, but some aspects of Lisp remained, mostly in the naming conventions. Commands and variables used – (dashes) as separators.

sandy-rescued = 1

This was a single variable, not sandy minus rescued. So how did SCUMM do subtraction? You didn’t. Until Monkey Island, there was no way to do complex expressions. if you wanted to subtract a value you used:

count -= 5

If you had a complex expression, you had to chain them using +=, -=, /= and *= using temp variables. Ugly, but it made the interpreter much simpler.

(Side note, the C64 interpreter was not named SPUTM, that name didn’t come about until the PC version).

One of the goals I had for the SCUMM system was that non-programers could use it. I wanted SCUMM scripts to look more like movies scripts, so the language got a little too wordy. This goal was never really reached, you always needed to be a programmer. :-(

Some examples:

actor sandy walk-to 67,8

This is the command that walked an actor to a spot.

actor sandy face-right
actor sandy do-animation reach
walk-actor razor to-object microwave-oven
start-script watch-edna
stop-script
stop-script watch-edna
say-line dave "Don't be a tuna head."
say-line selected-kid "I don't want to use that right now."
if (melt-down) {
  say-line selected-kid "I don't think this game is very fun."
}

There were no functions, everything was a ‘script’, a small piece of code that looked a lot like a function, but it ran in it’s own virtual process.

script watch-clock {
  do {
    object clock state ON
    break-time 60
    object clock state OFF
    break-time 60
  }
}

Note the lack of a until/while on the do loop. The script would just continue until someone killed it.

Time was always measured in jiffies (1/60 second). Later on I added the ability to say ‘break-time 1 minute’ and the compiler just multiplied by 60 before emitting the opcode.

If you did…

start-script watch-edna
start-script clock-tick

…you now had 2 additional scripts running and they all ran at the same time using preemptive multitasking. It was a heck of a lot better than doing state engines. If you wanted to keep an eye on something, you’d just say…

start-script watch-edna

…and then forget about it. That one script could do whatever it needed and start a cut-scene if the need arose.

This may seem pretty tame today, but back then, it was like magic.

Bonus information: The SCUMM compiler was written in Yacc and Lex.

The Mansion – Technical Aspects

by enthusi

(This is a reprint from “Vandalism News” issue 52)

Maniac Mansion is a true classic – in many ways.

It’s a classic example of smart game play versus photorealistic graphics that is lacking in so many modern productions and it’s the first LucasArts Adventure that made use of the point and click verb parser. And it’s of course eponymous for the famous SCUMM – the scripting utility for Maniac Mansion. This is reason enough to talk about it in some more detail and as a side effect advertise the NEOram version for C64 released just now. :)

MM has a nice Wikipedia page (especially in Finnish – check it out :) so I think I shall concentrate on some technical specialties:

The original Scumm (version 0) was designed for C64 in ’87 and its idea was to be able to develop a game independently of the target machine and only port the SPUTM (script presentation utility TM (seriously)) to several architectures. However it started with C64 and for several reasons the C64 version was kind of special in comparison to all later ports. Hence Scumm version 0 is only used here. Even Zak McKracken already had important changes and fixes. The original version came on disk with some copy protection (not too bad, not many cracks are around) and hence there is no in-game copy protection unlike all other versions (except for the Apple). You may have seen that code door at 2nd floor on other computers and guess what: the door code is made of PETSCII graphics chars! Officially the protection was left out due to disk space. In fact the 2 disk sides are pretty crammed in the original. So the game design was done on PC (UNIX system) and assembled there. This explains the spaghetti like distribution of the data and some of the code. In general the data is split into rooms. A room contains all the data for a specific area in the game. I.e. the beginning with the beautiful moon in the background or the kitchen is one ‘room’ each. The data consists of one graphics charset in multicolour mode and its screen map (from 40×17 up to 160×17 chars big), one full charset for the clipping mask with its map (same size as screen map of course), a colourmap for the fourth colour per char (ranging only from 0-7) and then the scripts themselves. There is one room data set for each location and numerous scripts, one for each room and several additional ones.

The scripts are more or less freely distributed among the rooms by the original SCUMM. They differ a LOT between English and German versions, for example to have minimal waste of space at the end of a sector on disk. Beside the room graphics and script data there is also data for sound effects, the characters in the game and the objects. The people in the game are drawn as sprites, but not constructed from ordinary sprites but plotted into a semi static multiplexed sprite matrix in real time each frame (17 raster-splits per frame). The data for the characters is also attached to several rooms. Often the one they appear in first. The text charset is not the ROM font btw. Since usually $01 is set to $14, hence RAM, RAM, RAM…

Each actor has a vertical stripe of sprites covering the whole height and this allows for quite some actors on screen simultaneously. The sprite priorities are set according to the actors positions, so you can actually walk in front or behind other kids and even orbit them for hours… MM is so much fun. :)

The graphics are stored in sprite like format (3 bytes per line) but with a successive height of 16 pixels normally. To save space each person is only stored from front, back and its right side. Whenever someone is facing left, the graphics data is mirrored via table and then plotted. Oh, and all the facial expressions, well the mouth and the chin (being one pixel) are not part of the data but added during plotting. Btw, the explosion in the intro, chucky the plant and the mummy are stored and handled as actors as well. You can see that when you leave the bathroom: the mummy disappears just before the rest does. To prevent the secret phone number from being revealed the background graphics is distorted of course. :) The graphics replacements are stored among the objects. So an open radio is a replacement of the object graphics radio. Stored as x, y position in char steps inside the room area and its width. Each object has an owner attribute and a position x, y in chars where to walk to (which is not necessarily identical to its position). Also there is a byte containing the height of the object in pixels (using 6 bits) while the upper 2 bits give the direction the actor is facing when reaching the object. So when Dave opens the fridge we see his back but when reaching a door, he’s facing right for example. The object’s name is also given along with a pointer to the script dealing with it. Those pointers are 24bit wide. 1 byte for the room it is attached to (again, these data change between different language versions) and another 16bit offset inside that data chunk.

The scripting for MM is rather complex. The memory resident interpreter understands 82 opcodes and handles 256 global variables! There is lots of things are stored and swapped around. Like whose friend with Ed or the Tentacle or whether the highscore at the arcade is already set… It even has fancy stuff like a random number generator (yes, the numbers in the game like phone numbers, codes) are not always the same! (Look for the cheat in the NEOram version that displays Edna’s phone number, the safe combination, the radio frequency and the highscore once it’s played).

A typical script contains commands like these:

(0035) (42)   startScript(103)
(0037) (02)   startMusic(185)
(0039) (CE)   putActorAtObject(Var(225),162)
(003C) (82)   stopCurrentScript()
(003D) (04)   unless (Var(61) <= 73) goto 5A96
(0042) (66)   Var(66) = getClosestObjActor25(113)
(0045) (02)   startMusic(186)
(0047) (CF)   setState02()
(0048) (E2)   stopScript(Var(162))
(004A) (82)   stopCurrentScript()
(004B) (03)   doSentence(62,74,55)
(004F) (5B)   Var(69) = getActorBitVar(103,Var(64))

Unlike in later Scumm revisions, version 0 for C64 has quite some C64 specific hard coded stuff inside the data. A nice example: ALL actor graphics are stored the same way (you can simply try a PC based multicolour sprite ripper on the disk images) except for the breasts of Sandy =D They are stored completely separate. Maybe to prevent early nude-hacks? Back to the data formats. There is quite some graphics in this game - there is a total of 52 rooms, including the initial kid selection screen, save game screen or cut scenes which are all handled by the same interpreter and format. To make all this fit onto two disk sides, there is a specific RLE encoding. The four most common bytes of the data to follow are given for each data chunk. These were derived by the SCUMM on the PC during compilation already. The encoding is arranged as following:

  • A data byte of < $40 gives the number of different single unpacked bytes to follow.
  • $3f to < $80 represents the length of the run of the byte to follow (+$3f of course).
  • $7f to < $a0 is the number of runs of most common byte 1,
  • $9f to < $c0 and $bf to <$e0 and $df-$ff for common byte 2, 3 and 4 respectively.

This encoding is rather effective for the MM graphics. Walkable areas for a room are composed of a list of multiple boxes with varying size and positions and stored after the graphics data. You probably noticed the slowdown during scroll in the game. This is due to the double buffered scroll which waits for all the clipping and room data to be moved into the correct RAM areas. The colour scroll of $d800 is done with some sort of line based speed code. Rastertime and memory are both used almost to the max. At start-up some own drive code is loaded into the floppy (not in my NEOram version of course). Later on a room is loaded by checking which room we want to see next, then getting its corresponding disk side from a table, checking for correct disk, getting the track and sector at which the room data starts from another table, calling a function to set this t/s offset, then calling another function with the target address in memory and another one with the length to be transferred. Only then will the room data be actually loaded. Even worse, usually only the first few bytes are read since they contain the offset to the table of other offsets (actors, objects, and scripts) which are then loaded to the corresponding memory areas. As soon as the drive code is initialised that RAM area on C64 side is overwritten by data AND code again. So the only memory available for extra code in this version was the drive interface and some other routines handling disk sides etc. A major gain was for example the now obsolete "wrong disk" message. This game really uses all the RAM there is. Most of the stack is occupied by code as well. Some of the JMPs instead of JSRs in the interpreter I didn't get at first were due to that fact - to keep the stack pointer high (that was kind of sucky to debug).

I was able to rewrite some interpreter routines as well to gain more space for the mouse driver and save game selection.

And, err... I removed the checksum check for each room data. That was also quite some speedup btw and it probably prevented lots of people from making trivial changes to the game. Yet it is hardly possible to apply big changes in the game logic. All offsets are hard-coded to the data. Even changing the length of the name for i.e. the 'fridge' by 1 char, causes dozens of offsets to change (for costumes, scripts, sounds, other rooms and so on). Even the interpreter looks rather different from language to language. And the RLE nature of the data, makes it pretty hard to change graphics as well since they have to keep the same size.

For the NEOram version I only redistributed the room data and completely replaced its track/sector/disk side structure.

So unlike all other patches before this one no longer 'emulates' disk access anymore. MM handles the pointer on screen very arborescent and there is no such thing like an "open" function call. Instead pointer positions are stored as x, y chars on screen, x, y relative to room origin in chars, x, y as pixels on screen, x, y as pixel coordinates (half the x resolution 0-159) and as a delta to last frame. All of these are set by different functions. So when a verb is highlighted and you press fire, multiple addresses are set already in the background. The call itself is a self modified JMP in the end. To allow the OPEN and PICKUP command on the right mouse button I rewrote the button routine instead. The input from the mouse is faked as if it would point to the OPEN verb, the pointer is hidden (more or less), the old position stored and the mouse driver is skipped for the next 3 frames while the joystick readout is rerouted to an own routine that returns button pressed for the next frame, then 1 frame of button release and still no mouse action, one frame to return old pointer coordinates and run the routine to convert the mouse coordinates to all those mentioned before and finally enable pointer sprite and toggle mouse control back in. Sounds a bit weird but otherwise there isnt enough rastertime left. This is also why you shouldn't use the mouse on an NTSC system. The mouse is read every 2nd frame only for the very same reason. The evaluation of the new coordinates for the game engine (x, y in absolute and relative chars and check for verb activation) are only carried out when the mouse driver itself is skipped. This was the only way to get things done before the next rastersplit is due. Also the mouse is disabled as long as a button is pressed or actions with objects or scrolling the inventory would take too much rastertime. And sometimes you see a little jerky screen during loading when the I/O area has to be switched in to set NEOram registers and the rasterline hits the IRQ to switch back to text mode. I chose this part of the screen since its the least disturbing and usually works quite ok. MM is and was great fun - as soon as I've had some rest I will look into Zak McKracken deeper. Almost can't wait to do so (imagine both games on one cart). Have a lot of fun and try all the different solutions to the game.

Yes, Jeff HAS some more or less special ability and you can use the integrated room viewer to check for things yet unseen ;-)

Did you every finish the game without shooting that poor meteor (why is no one calling it accurately 'meteorite'?) into space? Go ahead and enjoy another game of Maniac Mansion. The NEOram image will work well under vice by the way but if you consider assembling your own NEOram card just look for schematics and part-lists on the web or contact maniac`enthusi.de - Quite likely there are some PCB boards still around.

Ok, let's go rescue Sandy!

/enthusi

Limitations in Maniac Mansion

In (any version of) Maniac Mansion, if one kid is in the hallway with Green Tentacle, any other kid (even Syd and Razor) will refuse to go up the stairs to that hallway: “I’m not going up there… that monster’s got my friend!”

Why is that?

C64:



Nintendo Entertainment System:

MS-DOS Enhanced:

(Read Maniac Mansion creator Ron Gilbert’s authoritative answer in the comments.)

For Lisa, the World Ended in 1995

If you try to set the clock in Lisa OS 3.1 to 2010, you’re out of luck:

You can only enter years from 1981 to 1995. That’s a span of 15 years – why? And what happens if the clock runs past the end of 1995?

Well, it wraps around to 1 Jan 1980.

But why does it not allow entering 1980 then? That’s why:

Whenever the clock is set to 1980, it thinks the clock is not set up properly. So it is a 4 bit counter. Too bad, a 5 bit counter could have made it into 2011, and we all know that’s way more than ever needed.

Internals of BRK/IRQ/NMI/RESET on a MOS 6502

After 35 years of measuring the behaviour of the MOS 6502 CPU to better understand what is going on, the Visual6502 simulator finally allows us insight into the chip, so we can understand what the CPU does internally. One interesting thing here is the question how the 6502 handles BRK, IRQ, NMI and RESET.

The Specification

Let’s revisit the documented part first. The 6502 knows three vectors at the top of its address space:

Signal Vector
NMI $FFFA/$FFFB
RESET $FFFC/$FFFD
IRQ/BRK $FFFE/$FFFF
  • On a RESET, the CPU loads the vector from $FFFC/$FFFD into the program counter and continues fetching instructions from there.
  • On an NMI, the CPU pushes the low byte and the high byte of the program counter as well as the processor status onto the stack, disables interrupts and loads the vector from $FFFA/$FFFB into the program counter and continues fetching instructions from there.
  • On an IRQ, the CPU does the same as in the NMI case, but uses the vector at $FFFE/$FFFF.
  • On a BRK instruction, the CPU does the same as in the IRQ case, but sets bit #4 (B flag) in the copy of the status register that is saved on the stack.

The four operations are very similar, they only differ in the location of the vector, whether they actually push data onto the stack, and whether they set the B flag.

Signal Vector Push PC and P Set B Flag
NMI $FFFA/$FFFB yes no
RESET $FFFC/$FFFD no no
IRQ $FFFE/$FFFF yes no
BRK $FFFE/$FFFF yes yes

BRK

Ignoring opcode fetches, the PLA ROM defines the following cycles of the BRK instruction (6502 Programming Manual, page 131):

  • store PC(hi)
  • store PC(lo)
  • store P
  • fetch PC(lo) from $FFFE
  • fetch PC(hi) from $FFFF

IRQ

An IRQ does basically the same thing as a BRK, but it clears the B flag in the pushed status byte. The CPU goes through the same sequence of cycles as in the BRK case, which is done like this:

If there is an IRQ pending and the current instruction has just finished, the interrupt logic in the 6502 forces the instruction register (“IR”) to “0″, so instead of executing the next instruction, the PLA will decode the instruction with the opcode 0×00 – which is BRK! Of course it has to kick in a few cycles later again to make sure a B value of 0 is pushed, but otherwise, it’s just the BRK instruction executing.

NMI

Not surprisingly, NMI is done the same way: “0″ is injected into the instruction stream, but this time, some extra logic makes sure that the addresses $FFFA/$FFFB are put onto the address bus when fetching the vector.

RESET

RESET also runs through the same sequence, but it is the most different of the four cases, since it does not write the current PC and status onto the stack – but this was hacked trivially: The bus cycles exist, but the read/write line is not set to “write”, but “read” instead. The following trace was created with the transistor data from the Visual 6502 project and shows the first nine cycles after letting go of RESET:

#0 AB:00FF D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $00FF = $00

Cycle 0: When a 6502 is turned on, the stack pointer is initialized with zero. The BRK/IRQ/NMI/RESET sequence pulls the instruction register (IR) to 0.

#1 AB:00FF D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $00FF = $00
#2 AB:00FF D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $00FF = $00
#3 AB:0100 D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $0100 = $00

Cycle 3: The first stack access happens at address $0100 – a push first stores the value at $0100 + SP, then decrements SP. In the BRK/IRQ/NMI case, this would have stored the high-byte of the PC. But for RESET, it is a read cycle, not a write cycle, and the result is discarded.

#4 AB:01FF D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $01FF = $00

Cycle 4: SP is now 0xFF (even if the internal state does not reflect that), so the second stack access (which would have been the low-byte of PC) targets 0x01FF. Again, the result is discarded, and SP decremented.

#5 AB:01FE D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $01FE = $00

Cycle 5: SP is now 0xFE, and the third stack access, (the status register) happens at 0x01FE. SP is decremented again.

#6 AB:FFFC D:E2 R/W:1 PC:00FF A:AA X:00 Y:00 SP:FD P:06 IR:00  READ $FFFC = $E2

Cycle 6: The internal state of the CPU now shows that SP is 0xFD, because it got decremented 3 times for the three fake push operations. The low-byte of the vector is read.

#7 AB:FFFD D:FC R/W:1 PC:00FF A:AA X:00 Y:00 SP:FD P:16 IR:00  READ $FFFD = $FC

Cycle 7: The high-byte of the vector is read.

#8 AB:FCE2 D:A2 R/W:1 PC:FCE2 A:AA X:00 Y:00 SP:FD P:16 IR:00  READ $FCE2 = $A2

Cycle 8: The first actual instruction is fetched.

Since RESET is not timing critical, it doesn’t matter whether a few cycles are wasted by doing the fake stack cycles.

Measuring the Entropy of the MOS 6502 CPU

Everything can be expressed in bits. It takes 4 kilobits to decompress ZIP data, 25 kilobits to kill a human, 43 megabits for a working Mac OS X kernel, and 10^120 bits to describe our universe. What is the entropy/complexity of the 6502 CPU, you might wonder?

You have probably already seen Visual 6502, a simulator of the 6502 that operates at the transistor level and visualizes what’s going on inside the CPU. The data the program operates on was derived by converting a high-resolution photograph of the 6502 die into polygons, and converting these polygons into the node and transistor configuration.

If you ignore the polygon data, which is only needed for visualization, the data to fully describe the 6502 consists of a set of nodes and a set of transistors.

There are 1725 nodes in the 6502, and 3510 transistors. Every transistor has one node as an input that turns the transistor on or off. If the transistor is on, it connects two other nodes, and if it is off, it separates them. Every node can either be neutral, or be a pullup – the latter means that the node will behave as if it was connected to VCC whenever the path through this node is otherwise floating, i.e. not connected to VCC or GND (or an external pulldown).

So the raw information that describes the 6502 is a set of nodes, for which there is one bit of information, whether it is a pullup node or a neutral node, as well as a set of transistors, which is a three-tuple consisting of gate, c1 and c2.

The file segdefs.js in visual6502 contains (next to the polygon information) the node state: “+” means pullup, “-” (or a missing node description) means neutral. The file transdefs.js contains the transistor tuples.

Stripped from irrelevant data, the 6502 description would look like this: The node states:

BOOL nodes[] = {
    /*  0 */ 1,
    /*  1 */ 0,
    /*  2 */ 0,
    /*  3 */ 1,
    /*  4 */ 1,
    /*  5 */ 1,
    /*  6 */ 1,
    /*  7 */ 0,
    /*  8 */ 1,
    /*  9 */ 0,
    /* 10 */ 1,
    [...]
}

…and the transistors:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {357, 558, 217},
    {1608, 657, 349},
    {412, 558, 1146},
    {558, 558, 943},
    {826, 230, 657},
    [...]
}

A quick estimate would be 1 bit per node (pullup or neutral), and 3 times 11 bits (3 node numbers from 1 to 1725) per transistor, resulting in about 117 kilobits; but there is still a lot of redundancy in this representation.

The transistor tuples can be represented in a much more compact way. First, there is no ordering for the transistors, so if we sort them by gate, we can encode just the difference between the current gate and the previous gate:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {1, 890, 558},
    {4, 558, 11},
    {5, 558, 146},
    {6, 558, 282},
    {6, 874, 657},
    {7, 657, 1591},
    {7, 657, 1591},
    {7, 657, 1591},
    {8, 150, 558},
    [...]
}

In practice, the difference between two gate nodes in the transistor list is a number between 0 and 4, so we could use 3 bits (instead of 11) for the gate. But in fact, it is enough to store a single bit: The nodes have no order either, so we can renumber them so that all nodes that are connected to a gate of a transistor are numbered from 0, and all nodes that are not connected to gates will be numbered above the ones with gates:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {0, 890, 558},
    {1, 558, 11},
    {2, 558, 146},
    {3, 558, 282},
    {3, 874, 657},
    {4, 657, 1591},
    {4, 657, 1591},
    {4, 657, 1591},
    {5, 150, 558},
    [...]
}

This way, there are no holes in the list of gates – a new transistor has either the next node number as its gate, or the same one again (i.e. a node is gate of several transistors). It is enough to store a single bit for the gate.

The example above already shows that the nodes 657 and 558 show up a lot: These are VCC and GND, respectively. In fact, 234 transistors connect to VCC, and 2493 connect to GND. We could Huffman-encode the c1 and c2 nodes, but in practice, all other nodes except these two are relatively uniformly distributed, so let’s just special case them.

There is also no ordering between the two nodes the transistor connects, and no transistor will ever connect VCC/GND to VCC/GND, so we only need to do the special VCC/GND encoding for c1, and leave c2 as it is. Let’s use a single “1″ bit to represent GND, “01″ as VCC, and all other 11 bit node numbers will be prefixed with “00″, making them occupy 13 bits.

This way, the 3510 values of c1 can be described in 2493 * 1 (the GNDs) + 234 * 2 (the VCCs) + 783 * (2 + 11) (the others) bits = 13140 bits. The c2 values occupy 3510 * 11 bits = 38610 bits. If you add the bit per transistor for the gate (i.e. 3510 bits), we’re at 55260 bits for the transistor description.

Using arithmetic coding, we can describe a node number in about 10.75 bits instead of 11. This way, we save (783 + 3510) * 0.25 bits = 1073 bits. The transistor description thus fits into 54187 bits.

As stated previously, we also need one bit per node for the pullup information. About half the nodes are pullup, and we don’t have the freedom of sorting and renaming the nodes any more (we’ve done that already for the transistor gates), so we’ll have to go with the 1725 raw bits.

So the final number is 54187 bits + 1725 bits = 55912 bits. That’s 6989 bytes, or roughly 56 kilobits, and about twice as big as H1N1.

With this number, we can estimate the complexity of a Commodore 64 to be about 200 kilobits (CPU, VIC, SID, 2xCIA, PLA) – plus another 160 kilobits of ROM.

But wait: Did you see the duplicate transistors in the sorted list? Yes, there are some duplicates, in fact, there are only 3288 unique transistors. And when looking at it on a higher level, the 130 entry instruction decode ROM has duplicates too: Only 115 are unique, leading to redundant paths in the node/transistor graph. The reason for this redundancy is routing: There are only two dimensions in a computer chip (plus layers), so you cannot just connect anything to anything.

The homework for today, dear reader, is to develop an algorithm that minimizes the description of the 6502 by removing duplicate transistors as well as equivalent paths.