Category Archives: 6502

The Mansion – Technical Aspects

by enthusi

(This is a reprint from “Vandalism News” issue 52)

Maniac Mansion is a true classic – in many ways.

It’s a classic example of smart game play versus photorealistic graphics that is lacking in so many modern productions and it’s the first LucasArts Adventure that made use of the point and click verb parser. And it’s of course eponymous for the famous SCUMM – the scripting utility for Maniac Mansion. This is reason enough to talk about it in some more detail and as a side effect advertise the NEOram version for C64 released just now. :)

MM has a nice Wikipedia page (especially in Finnish – check it out :) so I think I shall concentrate on some technical specialties:

The original Scumm (version 0) was designed for C64 in ’87 and its idea was to be able to develop a game independently of the target machine and only port the SPUTM (script presentation utility TM (seriously)) to several architectures. However it started with C64 and for several reasons the C64 version was kind of special in comparison to all later ports. Hence Scumm version 0 is only used here. Even Zak McKracken already had important changes and fixes. The original version came on disk with some copy protection (not too bad, not many cracks are around) and hence there is no in-game copy protection unlike all other versions (except for the Apple). You may have seen that code door at 2nd floor on other computers and guess what: the door code is made of PETSCII graphics chars! Officially the protection was left out due to disk space. In fact the 2 disk sides are pretty crammed in the original. So the game design was done on PC (UNIX system) and assembled there. This explains the spaghetti like distribution of the data and some of the code. In general the data is split into rooms. A room contains all the data for a specific area in the game. I.e. the beginning with the beautiful moon in the background or the kitchen is one ‘room’ each. The data consists of one graphics charset in multicolour mode and its screen map (from 40×17 up to 160×17 chars big), one full charset for the clipping mask with its map (same size as screen map of course), a colourmap for the fourth colour per char (ranging only from 0-7) and then the scripts themselves. There is one room data set for each location and numerous scripts, one for each room and several additional ones.

The scripts are more or less freely distributed among the rooms by the original SCUMM. They differ a LOT between English and German versions, for example to have minimal waste of space at the end of a sector on disk. Beside the room graphics and script data there is also data for sound effects, the characters in the game and the objects. The people in the game are drawn as sprites, but not constructed from ordinary sprites but plotted into a semi static multiplexed sprite matrix in real time each frame (17 raster-splits per frame). The data for the characters is also attached to several rooms. Often the one they appear in first. The text charset is not the ROM font btw. Since usually $01 is set to $14, hence RAM, RAM, RAM…

Each actor has a vertical stripe of sprites covering the whole height and this allows for quite some actors on screen simultaneously. The sprite priorities are set according to the actors positions, so you can actually walk in front or behind other kids and even orbit them for hours… MM is so much fun. :)

The graphics are stored in sprite like format (3 bytes per line) but with a successive height of 16 pixels normally. To save space each person is only stored from front, back and its right side. Whenever someone is facing left, the graphics data is mirrored via table and then plotted. Oh, and all the facial expressions, well the mouth and the chin (being one pixel) are not part of the data but added during plotting. Btw, the explosion in the intro, chucky the plant and the mummy are stored and handled as actors as well. You can see that when you leave the bathroom: the mummy disappears just before the rest does. To prevent the secret phone number from being revealed the background graphics is distorted of course. :) The graphics replacements are stored among the objects. So an open radio is a replacement of the object graphics radio. Stored as x, y position in char steps inside the room area and its width. Each object has an owner attribute and a position x, y in chars where to walk to (which is not necessarily identical to its position). Also there is a byte containing the height of the object in pixels (using 6 bits) while the upper 2 bits give the direction the actor is facing when reaching the object. So when Dave opens the fridge we see his back but when reaching a door, he’s facing right for example. The object’s name is also given along with a pointer to the script dealing with it. Those pointers are 24bit wide. 1 byte for the room it is attached to (again, these data change between different language versions) and another 16bit offset inside that data chunk.

The scripting for MM is rather complex. The memory resident interpreter understands 82 opcodes and handles 256 global variables! There is lots of things are stored and swapped around. Like whose friend with Ed or the Tentacle or whether the highscore at the arcade is already set… It even has fancy stuff like a random number generator (yes, the numbers in the game like phone numbers, codes) are not always the same! (Look for the cheat in the NEOram version that displays Edna’s phone number, the safe combination, the radio frequency and the highscore once it’s played).

A typical script contains commands like these:

(0035) (42)   startScript(103)
(0037) (02)   startMusic(185)
(0039) (CE)   putActorAtObject(Var(225),162)
(003C) (82)   stopCurrentScript()
(003D) (04)   unless (Var(61) <= 73) goto 5A96
(0042) (66)   Var(66) = getClosestObjActor25(113)
(0045) (02)   startMusic(186)
(0047) (CF)   setState02()
(0048) (E2)   stopScript(Var(162))
(004A) (82)   stopCurrentScript()
(004B) (03)   doSentence(62,74,55)
(004F) (5B)   Var(69) = getActorBitVar(103,Var(64))

Unlike in later Scumm revisions, version 0 for C64 has quite some C64 specific hard coded stuff inside the data. A nice example: ALL actor graphics are stored the same way (you can simply try a PC based multicolour sprite ripper on the disk images) except for the breasts of Sandy =D They are stored completely separate. Maybe to prevent early nude-hacks? Back to the data formats. There is quite some graphics in this game - there is a total of 52 rooms, including the initial kid selection screen, save game screen or cut scenes which are all handled by the same interpreter and format. To make all this fit onto two disk sides, there is a specific RLE encoding. The four most common bytes of the data to follow are given for each data chunk. These were derived by the SCUMM on the PC during compilation already. The encoding is arranged as following:

  • A data byte of < $40 gives the number of different single unpacked bytes to follow.
  • $3f to < $80 represents the length of the run of the byte to follow (+$3f of course).
  • $7f to < $a0 is the number of runs of most common byte 1,
  • $9f to < $c0 and $bf to <$e0 and $df-$ff for common byte 2, 3 and 4 respectively.

This encoding is rather effective for the MM graphics. Walkable areas for a room are composed of a list of multiple boxes with varying size and positions and stored after the graphics data. You probably noticed the slowdown during scroll in the game. This is due to the double buffered scroll which waits for all the clipping and room data to be moved into the correct RAM areas. The colour scroll of $d800 is done with some sort of line based speed code. Rastertime and memory are both used almost to the max. At start-up some own drive code is loaded into the floppy (not in my NEOram version of course). Later on a room is loaded by checking which room we want to see next, then getting its corresponding disk side from a table, checking for correct disk, getting the track and sector at which the room data starts from another table, calling a function to set this t/s offset, then calling another function with the target address in memory and another one with the length to be transferred. Only then will the room data be actually loaded. Even worse, usually only the first few bytes are read since they contain the offset to the table of other offsets (actors, objects, and scripts) which are then loaded to the corresponding memory areas. As soon as the drive code is initialised that RAM area on C64 side is overwritten by data AND code again. So the only memory available for extra code in this version was the drive interface and some other routines handling disk sides etc. A major gain was for example the now obsolete "wrong disk" message. This game really uses all the RAM there is. Most of the stack is occupied by code as well. Some of the JMPs instead of JSRs in the interpreter I didn't get at first were due to that fact - to keep the stack pointer high (that was kind of sucky to debug).

I was able to rewrite some interpreter routines as well to gain more space for the mouse driver and save game selection.

And, err... I removed the checksum check for each room data. That was also quite some speedup btw and it probably prevented lots of people from making trivial changes to the game. Yet it is hardly possible to apply big changes in the game logic. All offsets are hard-coded to the data. Even changing the length of the name for i.e. the 'fridge' by 1 char, causes dozens of offsets to change (for costumes, scripts, sounds, other rooms and so on). Even the interpreter looks rather different from language to language. And the RLE nature of the data, makes it pretty hard to change graphics as well since they have to keep the same size.

For the NEOram version I only redistributed the room data and completely replaced its track/sector/disk side structure.

So unlike all other patches before this one no longer 'emulates' disk access anymore. MM handles the pointer on screen very arborescent and there is no such thing like an "open" function call. Instead pointer positions are stored as x, y chars on screen, x, y relative to room origin in chars, x, y as pixels on screen, x, y as pixel coordinates (half the x resolution 0-159) and as a delta to last frame. All of these are set by different functions. So when a verb is highlighted and you press fire, multiple addresses are set already in the background. The call itself is a self modified JMP in the end. To allow the OPEN and PICKUP command on the right mouse button I rewrote the button routine instead. The input from the mouse is faked as if it would point to the OPEN verb, the pointer is hidden (more or less), the old position stored and the mouse driver is skipped for the next 3 frames while the joystick readout is rerouted to an own routine that returns button pressed for the next frame, then 1 frame of button release and still no mouse action, one frame to return old pointer coordinates and run the routine to convert the mouse coordinates to all those mentioned before and finally enable pointer sprite and toggle mouse control back in. Sounds a bit weird but otherwise there isnt enough rastertime left. This is also why you shouldn't use the mouse on an NTSC system. The mouse is read every 2nd frame only for the very same reason. The evaluation of the new coordinates for the game engine (x, y in absolute and relative chars and check for verb activation) are only carried out when the mouse driver itself is skipped. This was the only way to get things done before the next rastersplit is due. Also the mouse is disabled as long as a button is pressed or actions with objects or scrolling the inventory would take too much rastertime. And sometimes you see a little jerky screen during loading when the I/O area has to be switched in to set NEOram registers and the rasterline hits the IRQ to switch back to text mode. I chose this part of the screen since its the least disturbing and usually works quite ok. MM is and was great fun - as soon as I've had some rest I will look into Zak McKracken deeper. Almost can't wait to do so (imagine both games on one cart). Have a lot of fun and try all the different solutions to the game.

Yes, Jeff HAS some more or less special ability and you can use the integrated room viewer to check for things yet unseen ;-)

Did you every finish the game without shooting that poor meteor (why is no one calling it accurately 'meteorite'?) into space? Go ahead and enjoy another game of Maniac Mansion. The NEOram image will work well under vice by the way but if you consider assembling your own NEOram card just look for schematics and part-lists on the web or contact maniac` - Quite likely there are some PCB boards still around.

Ok, let's go rescue Sandy!


How many Commodore 64 computers were really sold?

Nobody doubts that the C64 was the greatest selling single computer model of all time, it even made it into the Guinness Book of World Records, but nobody quite knows how many it really was: Most sources say 17 million, others say 22 or even 30 million. With a high degree of confidence, I can now say that Commodore only sold 12.5 million units – how I would know that, you ask, and how do I dare to contradict well-known facts? By analyzing serial numbers!

But let us first examine the existing claims of 17, 22 and 30 million.

Jack Tramiel’s numbers

The numbers 22 and 30 million actually come from Commodore founder Jack Tramiel himself. At the “Impact of the Commodore 64: A 25th Anniversary Celebration” in December 2007 at the Computer History Museum, Tramiel claimed that Commodore sold nearly half a million C64s a month until he left the company in 1984, and he extrapolated this to “between 22 and 30 million units” during the lifetime of the C64. Tramiel’s assistant Michael Tomczyk, who left the company around the same time, uses the same 22 million figure in his 1984 book “Home Computer Wars – An Insider’s Account of Commodore and Jack Tramiel”.

Commodore’s Numbers

But these numbers contradict Commodore’s official sales numbers as researched by Marc Walters: The 1993 Annual Report supposedly states that 17 million units had been sold (production was stopped in April 1994, so this should be very close to the final number). Commodore employee Dr. Peter Kittel confirms the 17 million.

Walters quotes Commodore’s 1993 financial details for the sales numbers between 1990 and 1993:

The following figures are from Commodore annual reports:
Fiscal 1990: 700K - 800K (decline begins),
fiscal 1991: 800K (non official 1M)
fiscal 1992: 650K
fiscal 1993: 150K - 200K
1993 Annual Report: 17M total C64, 4.5M C128

Marc Walters’ estimates

Walters also gives his own estimates for 1982 through 1989:

1982: 150K - 300K
1983: 2M
1984: 2M-3M
1985: 2M -3M
1986: 2M -3M
1987: 1M - 2M
1988: 1M - 1.5M
1989: 1M - 1.5M

This data was later picked up by Jeremy Reimer for his Ars Technica article “Total share: 30 years of personal computer market share figures“.

With these numbers, Commodore would have sold between 13.5 and 19 million units. The following graph shows this, with the 17 million figure added as a scaled version of the ranges given:

Putting all reliable data together

This data is quite consistent, but generic sources contradict these steep numbers in the first years:

This data, with Commodore’s own 1990-1993 data added and some of the missing numbers extrapolated, shows a rather flat and consistent sales curve with a single spike in 1984, totalling about 12.5 million units:

In 1984, about 2 to 2.5 million units were sold, which is about 200,000 units per month. There is no way Tramiel’s statement holds true, given the many sources of the early numbers and their consistency. This also falsifies claims of Tramiel’s 22 million and 30 million sold units, which could be approximated by extrapolating the 2.5 million units across a decade – but the existing data shows that the C64 clearly didn’t perform equally well afterwards.

Examining serial numbers

There are two projects that collect C64 serial numbers:

While the former collects lots of details including case badges and board types, the latter contains information like the video standard as well as photos of all serial number stickers. For the following statistics, we only need the C64 Serial Registry.

Commodore 64 computers were produced in at least 11 different factories worldwide with different and not immediately obvious conventions for serial numbers. But all PCBs, of which there were 16 versions, were manufactured in Hong Kong, and most versions were numbered sequentially, starting at 0 for each new version. The serial number could be found on a sticker on the shield of the cartridge port.

The fact that serial numbers are indeed per board version and are reset to zero for every new board can be seen in the following graph, which was created by sorting the serial numbers by board first and then by number:

If the assumption is correct, serial numbers have to increase linearly until they approach the highest serial number of the board version. All boards should have about the same slope. This is true for all board versions except “250425/A” and “250425/B”, as well as the last two, “250469/A” and “250469/A”.

This can be explained by the bias towards machines from North America in the C64 Serial Registry: About one third of the entries in the database are from the US and Canadian market, but a much smaller percentage of units has actually been sold there – the C64 was very strong in Europe. The first, second and fourth of these board types with a much higher slope were all sold exclusively, and the third one predominantly to PAL markets, i.e. outside North America.

(If the slope there is roughly twice as steep, we can estimate NTSC units are overrepresented in the database by a factor of two, meaning that about 1/6 of all C64 were sold in North America.)

These are the 16 different versions and revisions, the year they first appeared in, the number of boards of this kind in the C64 Serial Registry (that have the board serial number field filled), and the maximum serial number observed:

year board num_seem max_seen
1982 326298 4
1982 326298/A 21 325,512
1982 326298/B 6
1982 326298/C 6
1982 KU-14194HB 16
1983 250407/A 7 208,282
1983 250407/B 50 1,152,644
1983 250407/C 18 1,218,502
1984 250425/- 48 1,273,699
1984 250425/A 9 500,165
1984 250425/B 4 536,345
1986 250466 18 438,001
1987 250469/3 14 444,384
1988 250469/4 29 1,124,586
1989 250469/A 16 1,994,012
1990 250469/B 17 2,242,493

(Note that four of the early versions do not have any easily discoverable serial numbers on them. Also note that the PET 64 and Educator 64 devices, which were basically “326298/A” C64 boards in an all-in-one computer, are also counted, but they sold in very small numbers. The same is true for the ill-fated “250469/B”-based C64GS game console. The portable SX-64 has a different board and is not counted.)

Now we can use the formula for the German Tank Problem to estimate the total number produced for each type of board for which we have the maximum observed serial number:

k is the sample size and m is the highest serial number observed.

year board num_seem max_seen total
1982 326298 4
1982 326298/A 21 325,512 341,012
1982 326298/B 6
1982 326298/C 6
1982 KU-14194HB 16
1983 250407/A 7 208,282 238,036
1983 250407/B 50 1,152,644 1,175,696
1983 250407/C 18 1,218,502 1,286,196
1984 250425/- 48 1,273,699 1,300,233
1984 250425/A 9 500,165 555,738
1984 250425/B 4 536,345 670,430
1986 250466 18 438,001 462,333
1987 250469/3 14 444,384 476,125
1988 250469/4 29 1,124,586 1,163,364
1989 250469/A 16 1,994,012 2,118,637
1990 250469/B 17 2,242,493 2,374,403

For the board types without serial numbers, we can approximate the number of boards produced by scaling the result of board “326298/A” to the number of observed units of the missing ones. Since all boards without serials are from 1982 just like “326298/A”, it is probably not a very bad estimate.

(On the other hand, board “KU-14194HB” has only ever been put into machines produced in Germany and sold in Europe, so because of the bias of the database towards North America, this board type might be underrepresented.)

year board num_seem max_seen total
1982 326298 4 64,955
1982 326298/A 21 325,512 341,012
1982 326298/B 6 97,432
1982 326298/C 6 97,432
1982 KU-14194HB 16 259,818
1983 250407/A 7 208,282 238,036
1983 250407/B 50 1,152,644 1,175,696
1983 250407/C 18 1,218,502 1,286,196
1984 250425/- 48 1,273,699 1,300,233
1984 250425/A 9 500,165 555,738
1984 250425/B 4 536,345 670,430
1986 250466 18 438,001 462,333
1987 250469/3 14 444,384 476,125
1988 250469/4 29 1,124,586 1,163,364
1989 250469/A 16 1,994,012 2,118,637
1990 250469/B 17 2,242,493 2,374,403

So according to this estimate, about 12.5 million Commodore 64 computers were produced, which matches the number above.

(By the way, about 6 million of these had the new smaller board “250469/X” with the HMOS chipset, and almost all machines since 1989, which is about 4.5 million, were sold in Europe.)

Commodore Plus/4, C116, C16 (TED) Technical Documents

The Commodore Plus/4, the C16 and the C116 from 1984 were members of the 6502-based “TED” series, named after the 7360 TED (“Text Editing Device”) video controller. The TED systems were basically the low-cost cousins of the C64: The overall system architecture and the video chip are very similar to the C64′s, but they lack certain features like hardware sprites. On the other hand, there are some added features like extra colors and more control over the internal timing of the video chip.

In the Commodore archive at, there is a collection of GIF images that are scans of the some very interesting technical documents on the TED series, originally provided by Tibor Biczo and published by William Levak and Marko Mäkelä. I sorted the pages and converted them into searchable PDFs that are much nicer to look at:

“TED System Hardware Manual”

(PDF, 48 pages, 7.6 MB)

“TED 7360R0 Preliminary Data Sheet” (Apr 1983)

(PDF, 23 pages, 5.8 MB)

“TED Extra Pages”

(PDF, 5 pages, 1.4 MB)

The “Extra Pages” contain a map of the circuit board, a Plus/4 memory map in German, a TED register map, and a German version of section 4.5.2 of the TED System Hardware Manual.

“Service Manual Model Plus 4 Computer.pdf” (Oct 1984, PN-314001-04)

(PDF, 25 pages, 4.9 MB)

The service manual is also taken from

Final Cartridge III Undocumented Functions

The “Final Cartridge III” has been among the most popular Commodore 64 extensions, providing a floppy speeder, BASIC extensions, a machine language monior, a freezer and even a (rarely used) graphical desktop. The major advantage compared to other C64 cartridges is the consistent way in which the Final Cartridge III extends the C64 experience.

As it turns out, there are several undocumented instructions implemented in the Final Cartridge III.

Filtered Directory


The DOS”$” command passes all characters following the “$” to the disk drive, allowing the user to specify filters, like this:


This feature is not available for the “@” command in the monitor.

Fast Format


The 26 second fast format known from the “DESKTOP” GUI is also available from the command line. Note that this also works with the “@” command in the monitor. If the ID is omitted, this only overwrites BAM and directory, just like the “N” command.

Disk Rename


This commands renames the disk without erasing it. The ID can be up to 5 characters, so the default “2A” can be overwritten.

FC III ROM Banking


B, followed by a digit between 0 to 3, in the monitor enables the view of the ROMs of the Final Cartridge III. The specified ROM bank will be visible between $8000 and $BFFF. B without a parameter switches the ROM back off.

The following commands in the machine language monitor can be used to dump the complete ROM of an FC3 to disk:

B 0
T 8000 BFFF 8000
S "B0",08,8000,C000
B 1
T 8000 BFFF 8000
S "B1",08,8000,C000
B 2
T 8000 BFFF 8000
S "B2",08,8000,C000
B 3
T 8000 BFFF 8000
S "B3",08,8000,C000

Standalone Commodore BASIC on the iPhone/iPad

You might remember the hassle about the Commodore 64 emulator in the iPhone App Store about a year ago: First it was approved, but then pulled again, because it allowed access to the C64′s BASIC – general-purpose interpreters were not allowed. After Apple relaxed this restriction, BASIC was added again.

So now it fills me with joy that Ahmad Hawwash managed to get standalone Commodore BASIC into the App Store! His “Hand BASIC – CBM Flavor” is free of charge, runs on iPhone/iPod touch and iPad, and is based on’s Open Source cbmbasic project, a recompiled version of the original Commodore 64 binary, so the BASIC interpreter is itself not interpreted, but runs natively and at full speed, which is in the order of 500 times faster than on a C64.

“Hand BASIC” has LOAD/SAVE support and comes with several demo programs – just type LOAD"$",8 and LIST to see them and LOAD"NAME",8 and RUN to run them. Type the (nonstandard/added) HIDE keyword to hide the keyboard.

What would be very interesting now:

  • Some cool BASIC programs that run in here – games, maybe?
  • A compiler backend that produces Commodore BASIC code, so I can run any code on the iPhone through this – with a 38911 byte RAM limitation, of course. (Actually, a program is free to set the TOPMEM pointer higher, allowing up to 62 KB of RAM for code and variables.)

Any other ideas? :-)

Internals of BRK/IRQ/NMI/RESET on a MOS 6502

After 35 years of measuring the behaviour of the MOS 6502 CPU to better understand what is going on, the Visual6502 simulator finally allows us insight into the chip, so we can understand what the CPU does internally. One interesting thing here is the question how the 6502 handles BRK, IRQ, NMI and RESET.

The Specification

Let’s revisit the documented part first. The 6502 knows three vectors at the top of its address space:

Signal Vector
  • On a RESET, the CPU loads the vector from $FFFC/$FFFD into the program counter and continues fetching instructions from there.
  • On an NMI, the CPU pushes the low byte and the high byte of the program counter as well as the processor status onto the stack, disables interrupts and loads the vector from $FFFA/$FFFB into the program counter and continues fetching instructions from there.
  • On an IRQ, the CPU does the same as in the NMI case, but uses the vector at $FFFE/$FFFF.
  • On a BRK instruction, the CPU does the same as in the IRQ case, but sets bit #4 (B flag) in the copy of the status register that is saved on the stack.

The four operations are very similar, they only differ in the location of the vector, whether they actually push data onto the stack, and whether they set the B flag.

Signal Vector Push PC and P Set B Flag
NMI $FFFA/$FFFB yes no
IRQ $FFFE/$FFFF yes no
BRK $FFFE/$FFFF yes yes


Ignoring opcode fetches, the PLA ROM defines the following cycles of the BRK instruction (6502 Programming Manual, page 131):

  • store PC(hi)
  • store PC(lo)
  • store P
  • fetch PC(lo) from $FFFE
  • fetch PC(hi) from $FFFF


An IRQ does basically the same thing as a BRK, but it clears the B flag in the pushed status byte. The CPU goes through the same sequence of cycles as in the BRK case, which is done like this:

If there is an IRQ pending and the current instruction has just finished, the interrupt logic in the 6502 forces the instruction register (“IR”) to “0″, so instead of executing the next instruction, the PLA will decode the instruction with the opcode 0×00 – which is BRK! Of course it has to kick in a few cycles later again to make sure a B value of 0 is pushed, but otherwise, it’s just the BRK instruction executing.


Not surprisingly, NMI is done the same way: “0″ is injected into the instruction stream, but this time, some extra logic makes sure that the addresses $FFFA/$FFFB are put onto the address bus when fetching the vector.


RESET also runs through the same sequence, but it is the most different of the four cases, since it does not write the current PC and status onto the stack – but this was hacked trivially: The bus cycles exist, but the read/write line is not set to “write”, but “read” instead. The following trace was created with the transistor data from the Visual 6502 project and shows the first nine cycles after letting go of RESET:

#0 AB:00FF D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $00FF = $00

Cycle 0: When a 6502 is turned on, the stack pointer is initialized with zero. The BRK/IRQ/NMI/RESET sequence pulls the instruction register (IR) to 0.

#1 AB:00FF D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $00FF = $00
#2 AB:00FF D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $00FF = $00
#3 AB:0100 D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $0100 = $00

Cycle 3: The first stack access happens at address $0100 – a push first stores the value at $0100 + SP, then decrements SP. In the BRK/IRQ/NMI case, this would have stored the high-byte of the PC. But for RESET, it is a read cycle, not a write cycle, and the result is discarded.

#4 AB:01FF D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $01FF = $00

Cycle 4: SP is now 0xFF (even if the internal state does not reflect that), so the second stack access (which would have been the low-byte of PC) targets 0x01FF. Again, the result is discarded, and SP decremented.

#5 AB:01FE D:00 R/W:1 PC:00FF A:AA X:00 Y:00 SP:00 P:02 IR:00  READ $01FE = $00

Cycle 5: SP is now 0xFE, and the third stack access, (the status register) happens at 0x01FE. SP is decremented again.

#6 AB:FFFC D:E2 R/W:1 PC:00FF A:AA X:00 Y:00 SP:FD P:06 IR:00  READ $FFFC = $E2

Cycle 6: The internal state of the CPU now shows that SP is 0xFD, because it got decremented 3 times for the three fake push operations. The low-byte of the vector is read.

#7 AB:FFFD D:FC R/W:1 PC:00FF A:AA X:00 Y:00 SP:FD P:16 IR:00  READ $FFFD = $FC

Cycle 7: The high-byte of the vector is read.

#8 AB:FCE2 D:A2 R/W:1 PC:FCE2 A:AA X:00 Y:00 SP:FD P:16 IR:00  READ $FCE2 = $A2

Cycle 8: The first actual instruction is fetched.

Since RESET is not timing critical, it doesn’t matter whether a few cycles are wasted by doing the fake stack cycles.

Measuring the ROR Bug in the Early MOS 6502

The MOS 6502 CPU was introduced in September of 1975, and while the documentation described the three shift/rotate instructions ASL, LSR and ROL, the ROR instruction was missing – the documentation said that ROR would be available in chips starting in June 1976. In fact, the reason for this omission was that the instruction, while being present, didn’t behave correctly. Only few 6502s with the defect are in existence, and nobody seemed to have checked what was actually going on in these chips.

Simon C got my KIM-1 working again, which has a 6502 from week 51 of 1975. There are 512 possible inputs to ROR (8 bit A plus 1 bit C; assuming it doesn’t have dependencies on other registers), and roughly two bytes of output: the 8 bit result and the processor status (flags) register. We ran the following programs on the KIM-1 – note that we had to split the task into several programs, because the KIM-1 doesn’t have enough RAM to hold all results.

Program 1a: for A=00..FF, with C=0, execute “ROR A”, save result into array

.C:0200   A2 00      LDX #$00
.C:0202   8A         TXA
.C:0203   18         CLC
.C:0204   6A         ROR A
.C:0205   9D 00 03   STA $0300,X
.C:0208   E8         INX
.C:0209   D0 F7      BNE $0202
.C:020B   00         BRK

Program 1b: for A=00..FF, with C=1, execute “ROR A”, save result into array

.C:0200   A2 00      LDX #$00
.C:0202   8A         TXA
.C:0203   38         SEC
.C:0204   6A         ROR A
.C:0205   9D 00 03   STA $0300,X
.C:0208   E8         INX
.C:0209   D0 F7      BNE $0202
.C:020B   00         BRK

Program 2a: for A=00..FF, with C=0, execute “ROR A”, save flags into array

.C:0200   A2 00      LDX #$00
.C:0202   8A         TXA
.C:0203   38         SEC
.C:0204   6A         ROR A
.C:0205   08         PHP
.C:0206   68         PLA
.C:0207   9D 00 03   STA $0300,X
.C:020a   E8         INX
.C:020b   D0 F5      BNE $0202
.C:020d   00         BRK

Program 2b: for A=00..FF, with C=1, execute “ROR A”, save flags into array

.C:0200   A2 00      LDX #$00
.C:0202   8A         TXA
.C:0203   38         SEC
.C:0204   6A         ROR A
.C:0205   08         PHP
.C:0206   68         PLA
.C:0207   9D 00 03   STA $0300,X
.C:020a   E8         INX
.C:020b   D0 F5      BNE $0202
.C:020d   00         BRK

These are the results:

Program 1a and 1b (result with C=0 and C=1) produced the same output!

00 -> 00
01 -> 02
02 -> 04
03 -> 06
04 -> 08
05 -> 0A
FF -> FE

This is a shift left (!!!) of the input value; and it is independent of the carry flag – just like ASL.

Program 2a (status when C=0) produced the following output:

00:     7E (Z=1, N=0, C=0)
01..3F: 7C (Z=0, N=0, C=0)
40..7F: FC (Z=0, N=1, C=0)
80:     7E (Z=1, N=0, C=0)
81..BF: 7C (Z=0, N=0, C=0)
C0..FF: FC (Z=0, N=1, C=0)

Program 2b (status when C=1) produced the following output:

00:     7F (Z=1, N=0, C=1)
01..3F: 7D (Z=0, N=0, C=1)
40..7F: FD (Z=0, N=1, C=1)
80:     7F (Z=1, N=0, C=1)
81..BF: 7D (Z=0, N=0, C=1)
C0..FF: FD (Z=0, N=1, C=1)

These are the correct flags corresponding to the incorrect results in A. The carry flag is the same as the input carry flag, i.e. it is unmodified.

Our preliminary summary is this:

  ROR          Broken ROR on pre-June '76 CPU (Memory or Accumulator)
  Operation:       |7|6|5|4|3|2|1|0| <- 0
                   +-+-+-+-+-+-+-+-+                    N Z C I D V
                                                        / / _ _ _ _

So ROR on early 6502s does three things wrong:

  1. It shifts left, instead of right (behaves like ASL)
  2. It shifts a zero in, instead of C (behaves like ASL)
  3. It doesn't update C (as if it wasn't a rotate instruction)

All three problems are flags that are sent to the ALU: the shift direction, the input bit, and the carry writeback.

Unresolved questions:

  • We only tested ROR A; other addressing modes of ROR might behave differently. Other addressing modes might even be working - but I doubt that, since MOS would certainly have documented the working ones then.
  • ROR might have more dependencies than A and C.
  • What is it in the chip that causes the bug? I'm sure the fine guys at will be able to figure this one out soon. It is unlikely to be a bug in the PLA ROM, because a bug there would not affect different addressing modes of the same instruction with very different timings. It is more likely that it is in the "Random Control Logic" part.

Measuring the Entropy of the MOS 6502 CPU

Everything can be expressed in bits. It takes 4 kilobits to decompress ZIP data, 25 kilobits to kill a human, 43 megabits for a working Mac OS X kernel, and 10^120 bits to describe our universe. What is the entropy/complexity of the 6502 CPU, you might wonder?

You have probably already seen Visual 6502, a simulator of the 6502 that operates at the transistor level and visualizes what’s going on inside the CPU. The data the program operates on was derived by converting a high-resolution photograph of the 6502 die into polygons, and converting these polygons into the node and transistor configuration.

If you ignore the polygon data, which is only needed for visualization, the data to fully describe the 6502 consists of a set of nodes and a set of transistors.

There are 1725 nodes in the 6502, and 3510 transistors. Every transistor has one node as an input that turns the transistor on or off. If the transistor is on, it connects two other nodes, and if it is off, it separates them. Every node can either be neutral, or be a pullup – the latter means that the node will behave as if it was connected to VCC whenever the path through this node is otherwise floating, i.e. not connected to VCC or GND (or an external pulldown).

So the raw information that describes the 6502 is a set of nodes, for which there is one bit of information, whether it is a pullup node or a neutral node, as well as a set of transistors, which is a three-tuple consisting of gate, c1 and c2.

The file segdefs.js in visual6502 contains (next to the polygon information) the node state: “+” means pullup, “-” (or a missing node description) means neutral. The file transdefs.js contains the transistor tuples.

Stripped from irrelevant data, the 6502 description would look like this: The node states:

BOOL nodes[] = {
    /*  0 */ 1,
    /*  1 */ 0,
    /*  2 */ 0,
    /*  3 */ 1,
    /*  4 */ 1,
    /*  5 */ 1,
    /*  6 */ 1,
    /*  7 */ 0,
    /*  8 */ 1,
    /*  9 */ 0,
    /* 10 */ 1,

…and the transistors:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {357, 558, 217},
    {1608, 657, 349},
    {412, 558, 1146},
    {558, 558, 943},
    {826, 230, 657},

A quick estimate would be 1 bit per node (pullup or neutral), and 3 times 11 bits (3 node numbers from 1 to 1725) per transistor, resulting in about 117 kilobits; but there is still a lot of redundancy in this representation.

The transistor tuples can be represented in a much more compact way. First, there is no ordering for the transistors, so if we sort them by gate, we can encode just the difference between the current gate and the previous gate:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {1, 890, 558},
    {4, 558, 11},
    {5, 558, 146},
    {6, 558, 282},
    {6, 874, 657},
    {7, 657, 1591},
    {7, 657, 1591},
    {7, 657, 1591},
    {8, 150, 558},

In practice, the difference between two gate nodes in the transistor list is a number between 0 and 4, so we could use 3 bits (instead of 11) for the gate. But in fact, it is enough to store a single bit: The nodes have no order either, so we can renumber them so that all nodes that are connected to a gate of a transistor are numbered from 0, and all nodes that are not connected to gates will be numbered above the ones with gates:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {0, 890, 558},
    {1, 558, 11},
    {2, 558, 146},
    {3, 558, 282},
    {3, 874, 657},
    {4, 657, 1591},
    {4, 657, 1591},
    {4, 657, 1591},
    {5, 150, 558},

This way, there are no holes in the list of gates – a new transistor has either the next node number as its gate, or the same one again (i.e. a node is gate of several transistors). It is enough to store a single bit for the gate.

The example above already shows that the nodes 657 and 558 show up a lot: These are VCC and GND, respectively. In fact, 234 transistors connect to VCC, and 2493 connect to GND. We could Huffman-encode the c1 and c2 nodes, but in practice, all other nodes except these two are relatively uniformly distributed, so let’s just special case them.

There is also no ordering between the two nodes the transistor connects, and no transistor will ever connect VCC/GND to VCC/GND, so we only need to do the special VCC/GND encoding for c1, and leave c2 as it is. Let’s use a single “1″ bit to represent GND, “01″ as VCC, and all other 11 bit node numbers will be prefixed with “00″, making them occupy 13 bits.

This way, the 3510 values of c1 can be described in 2493 * 1 (the GNDs) + 234 * 2 (the VCCs) + 783 * (2 + 11) (the others) bits = 13140 bits. The c2 values occupy 3510 * 11 bits = 38610 bits. If you add the bit per transistor for the gate (i.e. 3510 bits), we’re at 55260 bits for the transistor description.

Using arithmetic coding, we can describe a node number in about 10.75 bits instead of 11. This way, we save (783 + 3510) * 0.25 bits = 1073 bits. The transistor description thus fits into 54187 bits.

As stated previously, we also need one bit per node for the pullup information. About half the nodes are pullup, and we don’t have the freedom of sorting and renaming the nodes any more (we’ve done that already for the transistor gates), so we’ll have to go with the 1725 raw bits.

So the final number is 54187 bits + 1725 bits = 55912 bits. That’s 6989 bytes, or roughly 56 kilobits, and about twice as big as H1N1.

With this number, we can estimate the complexity of a Commodore 64 to be about 200 kilobits (CPU, VIC, SID, 2xCIA, PLA) – plus another 160 kilobits of ROM.

But wait: Did you see the duplicate transistors in the sorted list? Yes, there are some duplicates, in fact, there are only 3288 unique transistors. And when looking at it on a higher level, the 130 entry instruction decode ROM has duplicates too: Only 115 are unique, leading to redundant paths in the node/transistor graph. The reason for this redundancy is routing: There are only two dimensions in a computer chip (plus layers), so you cannot just connect anything to anything.

The homework for today, dear reader, is to develop an algorithm that minimizes the description of the 6502 by removing duplicate transistors as well as equivalent paths.