Measuring the Entropy of the MOS 6502 CPU

Everything can be expressed in bits. It takes 4 kilobits to decompress ZIP data, 25 kilobits to kill a human, 43 megabits for a working Mac OS X kernel, and 10^120 bits to describe our universe. What is the entropy/complexity of the 6502 CPU, you might wonder?

You have probably already seen Visual 6502, a simulator of the 6502 that operates at the transistor level and visualizes what’s going on inside the CPU. The data the program operates on was derived by converting a high-resolution photograph of the 6502 die into polygons, and converting these polygons into the node and transistor configuration.

If you ignore the polygon data, which is only needed for visualization, the data to fully describe the 6502 consists of a set of nodes and a set of transistors.

There are 1725 nodes in the 6502, and 3510 transistors. Every transistor has one node as an input that turns the transistor on or off. If the transistor is on, it connects two other nodes, and if it is off, it separates them. Every node can either be neutral, or be a pullup – the latter means that the node will behave as if it was connected to VCC whenever the path through this node is otherwise floating, i.e. not connected to VCC or GND (or an external pulldown).

So the raw information that describes the 6502 is a set of nodes, for which there is one bit of information, whether it is a pullup node or a neutral node, as well as a set of transistors, which is a three-tuple consisting of gate, c1 and c2.

The file segdefs.js in visual6502 contains (next to the polygon information) the node state: “+” means pullup, “-” (or a missing node description) means neutral. The file transdefs.js contains the transistor tuples.

Stripped from irrelevant data, the 6502 description would look like this: The node states:

BOOL nodes[] = {
    /*  0 */ 1,
    /*  1 */ 0,
    /*  2 */ 0,
    /*  3 */ 1,
    /*  4 */ 1,
    /*  5 */ 1,
    /*  6 */ 1,
    /*  7 */ 0,
    /*  8 */ 1,
    /*  9 */ 0,
    /* 10 */ 1,

…and the transistors:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {357, 558, 217},
    {1608, 657, 349},
    {412, 558, 1146},
    {558, 558, 943},
    {826, 230, 657},

A quick estimate would be 1 bit per node (pullup or neutral), and 3 times 11 bits (3 node numbers from 1 to 1725) per transistor, resulting in about 117 kilobits; but there is still a lot of redundancy in this representation.

The transistor tuples can be represented in a much more compact way. First, there is no ordering for the transistors, so if we sort them by gate, we can encode just the difference between the current gate and the previous gate:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {1, 890, 558},
    {4, 558, 11},
    {5, 558, 146},
    {6, 558, 282},
    {6, 874, 657},
    {7, 657, 1591},
    {7, 657, 1591},
    {7, 657, 1591},
    {8, 150, 558},

In practice, the difference between two gate nodes in the transistor list is a number between 0 and 4, so we could use 3 bits (instead of 11) for the gate. But in fact, it is enough to store a single bit: The nodes have no order either, so we can renumber them so that all nodes that are connected to a gate of a transistor are numbered from 0, and all nodes that are not connected to gates will be numbered above the ones with gates:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {0, 890, 558},
    {1, 558, 11},
    {2, 558, 146},
    {3, 558, 282},
    {3, 874, 657},
    {4, 657, 1591},
    {4, 657, 1591},
    {4, 657, 1591},
    {5, 150, 558},

This way, there are no holes in the list of gates – a new transistor has either the next node number as its gate, or the same one again (i.e. a node is gate of several transistors). It is enough to store a single bit for the gate.

The example above already shows that the nodes 657 and 558 show up a lot: These are VCC and GND, respectively. In fact, 234 transistors connect to VCC, and 2493 connect to GND. We could Huffman-encode the c1 and c2 nodes, but in practice, all other nodes except these two are relatively uniformly distributed, so let’s just special case them.

There is also no ordering between the two nodes the transistor connects, and no transistor will ever connect VCC/GND to VCC/GND, so we only need to do the special VCC/GND encoding for c1, and leave c2 as it is. Let’s use a single “1” bit to represent GND, “01” as VCC, and all other 11 bit node numbers will be prefixed with “00”, making them occupy 13 bits.

This way, the 3510 values of c1 can be described in 2493 * 1 (the GNDs) + 234 * 2 (the VCCs) + 783 * (2 + 11) (the others) bits = 13140 bits. The c2 values occupy 3510 * 11 bits = 38610 bits. If you add the bit per transistor for the gate (i.e. 3510 bits), we’re at 55260 bits for the transistor description.

Using arithmetic coding, we can describe a node number in about 10.75 bits instead of 11. This way, we save (783 + 3510) * 0.25 bits = 1073 bits. The transistor description thus fits into 54187 bits.

As stated previously, we also need one bit per node for the pullup information. About half the nodes are pullup, and we don’t have the freedom of sorting and renaming the nodes any more (we’ve done that already for the transistor gates), so we’ll have to go with the 1725 raw bits.

So the final number is 54187 bits + 1725 bits = 55912 bits. That’s 6989 bytes, or roughly 56 kilobits, and about twice as big as H1N1.

With this number, we can estimate the complexity of a Commodore 64 to be about 200 kilobits (CPU, VIC, SID, 2xCIA, PLA) – plus another 160 kilobits of ROM.

But wait: Did you see the duplicate transistors in the sorted list? Yes, there are some duplicates, in fact, there are only 3288 unique transistors. And when looking at it on a higher level, the 130 entry instruction decode ROM has duplicates too: Only 115 are unique, leading to redundant paths in the node/transistor graph. The reason for this redundancy is routing: There are only two dimensions in a computer chip (plus layers), so you cannot just connect anything to anything.

The homework for today, dear reader, is to develop an algorithm that minimizes the description of the 6502 by removing duplicate transistors as well as equivalent paths.

12 thoughts on “Measuring the Entropy of the MOS 6502 CPU”

  1. The link “43 megabits for a working Mac OS X kernel” does not work (file:///mach_kernel) – or is this intentional?

    – strik

  2. It’s the kernel on his machine, the link only works if you have Mac OS X, in which case you have a mach_kernel on your drive and can check its size :)

  3. Ok: Thus, in this case, the phrase “or is this intentional” applies.

    Thank you for clarifiying.

  4. No need for specific algorithms, just run the dataset through a reasonably good compressor (say, LZMA), and you should have a reasonable estimate of the actual complexity.

  5. @Gintas: No general-purpose compression algorithm knows that the order of the nodes and transistors is not important, so they can’t reduce the storage requirement of the gate to a bit.

  6. Hmpf. That article on H1N1 assumes that the complexity of proteins is 6 bits per amino acid, but there are only 20 amino acids, not 64.

    If you assume all 20 amino acid are evenly distributed, you get about 4.3 bits per amino acid. A quick Google search gives the observed amino acid frequencies in vertebrates (, which we can use as an approximation for the frequencies in viral proteins, for an entropy of about 4.2 bits per amino acid. With these numbers we get a total complexity of 18.3 resp. 17.8 kilobits for the H1N1 virus.

  7. You forgot, that there 20 (but, in fact, 21) amino acids in proteins, but they are encoded by a triplet of 4 different bases, and triplets are redundant for selected amino acids (and some triplets encode no amino acid as well – e.g., UAA, UAG, or UGA are STOP-codons). So to code the virus, you really need more than 4.2 bits. Things are not so matematically simple in nature.

  8. Um? the redundant base encoding of amino acids (which encodes 20 out of the 21 amino acids found in eucaryotes, BTW) is exactly the reason the entropy is around 4.x bits, not 6. More redundancy = less entropy. So, your point being??

  9. I’m not sure but the structure of the 6502 might not be so simple. As I was analyzing the code I’ve noticed that the transistors have implemented something called “parasitic capacity” (at least my friend knowing a lot about electronics said so) i.e. after shutting the transistor off the connected nodes that was previously high and now are floating keep weak charge that can leave transistors connected to it’s gate open (for a while I presume). I don’t know whether it is intentionally used in 6502 but might be and it introduces some more information in to the chip.

  10. Although nodes[] is re-ordered is so all nodes that are mentioned as transistor[].gates come first, could a second sort key be whether its floating or pulled-up? That would make nodes[] a run of zero or more 0s, then 1s, then 0s again, this time for nodes that aren’t ever a gate, then 1s again.

  11. i am trying to find somebody (anybody) who is expert genius programmer
    with 6502/65xx cpu family. i would like to learn to program & work with
    this cpu so i can build things. if anybody out there knows this cpu very very
    well and how to make LCD-based for display and button-interactive or key
    pad input devices please get in touch with me. i have no use for phosphor
    crt monitors at all which output devices only complicate things & would be
    too difficult to learn. i am totally happy with just using an LCD display so i
    can see CAPACITANCE or INDUCTANCE reading, or, see what the reading
    would be using an ADC/DAC chip, etc. thank you, Peter. ////

Leave a Comment