Category Archives: puzzle

Oops! I don't have my CashCard here.

It seems that a lot of readers of pagetable.com are fans of SCUMM games like Maniac Mansion (and at least one is their creator). Here is a puzzle for you (credits go to Bernhard Bauer):

In Zak McKracken, how can you get into the situation pictured below? Note that you need the CashCard in order to leave San Francisco, and Zak cannot give away his CashCard.

C64:

FM Towns:

Bonus points if you solve it by converting the SCUMM script into a graph and programmatically finding the shortest path to this state.

(Read the solution in the comments.)

Limitations in Maniac Mansion

In (any version of) Maniac Mansion, if one kid is in the hallway with Green Tentacle, any other kid (even Syd and Razor) will refuse to go up the stairs to that hallway: “I’m not going up there… that monster’s got my friend!”

Why is that?

C64:



Nintendo Entertainment System:

MS-DOS Enhanced:

(Read Maniac Mansion creator Ron Gilbert’s authoritative answer in the comments.)

Xbox Serial Number Statistics

Slashdot had a story recently on how in 1942, the allies were able to estimate the number of German taks produced based on the serial numbers of the tanks. In 2010, a German hacker is doing the exact same thing with Xboxes. This article describes the generic approach, shows some results, and provides previously unreleased raw data of 14,000 Xbox serials so you can do your own statistics!

Between October 2003 and January 2005, the Xbox Linux Project asked all visitors to their website to enter their Xbox serial numbers, date and country of manufacture, ROM version, hard disk and DVD drive brand and other properties, and gathered more than 14,000 entries. The original idea was to find a rule to deduce the hard disk and DVD drive types in an Xbox by only looking at the serial number, which was visible through the unopened packaging.

The serial sticker on an Xbox looks like this:

MFG. DATE  2002-03-03
SERIAL NO. 1166356 20903

After looking at several serial numbers, it was already clear that the last two digits (“03″ in my example) are the location of manufacture: 02 is Mexico, 03 is Hungary, 05 is China and 06 is Taiwan. The three digits before (“209″ in my example) are the one-digit year (“2″ for “2002″) and the two-digit calender week (“09″ for around the first week of March).

Now we want to find out how many devices were manufactured. A first approximation is to look at the manufacturing dates of all Xboxes in our database.

This gives us an idea when production was ramped up (in 2001 and 2002 in November, and in 2003 in August, September and October), but the statistics don’t give us absolute numbers, and they are biased towards older devices (newer devices are not entered yet, and visitors of our site tend to be early adopters).

But what about these first seven digits of the serial number? Shouldn’t these be actual “serial” numbers? Let’s look at all devices from August 2003 and sort the first seven digits by manufacturing date:

This does not look like a serial number. But all numbers are > 1,000,000, which implies that the first digit has a special meaning and is not part of the number. Let’s look at distribution of the first digit:

The first digit seems to be the number of the assembly line in the factory! So let’s look at the remaining 6 digits again:

This looks a lot better! But there are several things interleaved in this chart – because the serial numbers are of course counted independently in every factory. If we filter just all numbers form the Chinese factory, we get this:

We can see serial numbers are counted up every week, but we still see all assembly lines interleaved here, and the different lines don’t reset at the same time. Here is line 6 all by itself:

Looks almost perfect, if we assume the wild shots are caused by typos. Here is a manually fixed version of it:

Voilà! Serial numbers that count up monotonically and get reset on every Sunday.

By inspection of the graph, we can estimate that assembly line 6 of the factory in China produced about 275,000 devices per week in week 33 (mid August) of 2003. This works well, because we have so many samples; but for other weeks, we have as few as five. This is the formula for the German Tank Problem:

k is the sample size and m s the highest serial number observed.

The estimate of Xboxes produced by assembly line 6 in China in week 33 of 2003 is therefore 285,269. Applying this to every assembly line of every factory and every week, it should be easy to get great statistics on the productivity of the different lines and factories, as well as a very good estimate of the total number of devices produced. …and this is where you come in!

The Data

You want to do your own statistics? Here is the raw data:

xbox_serials.csv (2.5 MB)

It is a comma-separated-value file with the following columns:

Column Example Description Comment
1 2002-03-03 Manufacture Date YYYY-MM-DD
2 2002-04-30 Date of Purchace YYYY-MM-DD
3 de Country of Purchase two-digit code
4 1166356 20903 Serial Number nnnnnnn nnnnn
5 v1.0 Xbox Version motherboard revision
6 3944 Kernel Version ROM version as shown in “About” dialog
7 4034 Dashboard Version HD software version as shown in “About” dialog
8 Unknown/Other Flash what’s printed on flash ROM chip
9 PAL Video Standard PAL or NTSC
10 Black Case Color Xboxes are black, but there are some special editions
11 Thomson DVD Drive Philips, Samsung, Thomson
12 Seagate 10 GB Hard Disk Seagate or Western Digital
13 Conexant Video Encoder Brand Conexant, Focus, Xcalibur
14 Golden Xbox comments free-form field

Please note that people were able to fill some fields with arbitraty data, so they might not necessarily be in exactly the specified form. There are also lots of typos in the serial numbers and the month and day fields in the data fields have been mixed up sometimes. You probably want to run a script over the data first that sanitizes some of the input, e.g. removes dashes and spaces from serial numbers etc.

Here are some ideas on what you might want to find out:

  1. Is there a better formula to estimate the number of Xboxes produced per week on a certain assembly line?
  2. What day does a week start with? Does the factory produce Xboxes on Sundays? Do they produce just as many? Is it different in the respective countries?
  3. How many Xboxes were produced per assembly line, per week and per factory?
  4. Are all assembly lines in a certain factory just as productive?
  5. Are all factories just as productive (per assembly line)?
  6. Did productivity go up over time? Did it hit a maximum?
  7. How many Xboxes were produced total?
  8. Does an assembly line in a certain factory use all the same flash chips, hard drives and DVD drives in a certain week?
  9. When did an assembly line in a certan factory switch between board revisions?
  10. How long does it take an assembly line to be reconfigured for a different board revision?
  11. When did factories open/close? When did assembly lines get created and torn down in certain factories? Is there a correlation? Did assembly lines get migrated between factories? How long does this take?
  12. How long does it take on average for an Xbox from manufacuring to when it’s bought, per country? Does it change over the years?
  13. Which factories serve which countries? Did it change?
  14. How do ROM version, HD software version, motherboard version and video encoder brand correlate to each other?
  15. Which countries have PAL, which have NTSC?
  16. Where were the non-black Xboxes made?
  17. What percentage of Xboxes has a Philips, a Samsung or a Thomson DVD drive?
  18. What is the distribution of hard drive types?
  19. Some people claim they have a 20 GB hard drive. How credible is this?
  20. When and at which factories were certain DVD and HD types introduced?
  21. Over time, how did the distribution of DVD and HD types change?
  22. What is the distribution of flash chips, how did it change, and how does it correlate to factories?
  23. Is there enough data to make statements about the refurbishment process (search for “refurb” in comments)?
  24. What percentage of people misses a digit when trying to type in 12 digits?
  25. What percentage of people replaced digits of the serial number with an ‘X’ or a ‘*’? What percentage of these chose the right digits to properly anonymize their serial numbers?
  26. Any more interesting observations you can come up with?

Please share your ideas as well as your results (plus source code of your scripts, please)! If you know any statistics teachers looking for a large real-world data set and an interesting set of problems, feel free to refer them to this site! :-)

Measuring the Entropy of the MOS 6502 CPU

Everything can be expressed in bits. It takes 4 kilobits to decompress ZIP data, 25 kilobits to kill a human, 43 megabits for a working Mac OS X kernel, and 10^120 bits to describe our universe. What is the entropy/complexity of the 6502 CPU, you might wonder?

You have probably already seen Visual 6502, a simulator of the 6502 that operates at the transistor level and visualizes what’s going on inside the CPU. The data the program operates on was derived by converting a high-resolution photograph of the 6502 die into polygons, and converting these polygons into the node and transistor configuration.

If you ignore the polygon data, which is only needed for visualization, the data to fully describe the 6502 consists of a set of nodes and a set of transistors.

There are 1725 nodes in the 6502, and 3510 transistors. Every transistor has one node as an input that turns the transistor on or off. If the transistor is on, it connects two other nodes, and if it is off, it separates them. Every node can either be neutral, or be a pullup – the latter means that the node will behave as if it was connected to VCC whenever the path through this node is otherwise floating, i.e. not connected to VCC or GND (or an external pulldown).

So the raw information that describes the 6502 is a set of nodes, for which there is one bit of information, whether it is a pullup node or a neutral node, as well as a set of transistors, which is a three-tuple consisting of gate, c1 and c2.

The file segdefs.js in visual6502 contains (next to the polygon information) the node state: “+” means pullup, “-” (or a missing node description) means neutral. The file transdefs.js contains the transistor tuples.

Stripped from irrelevant data, the 6502 description would look like this: The node states:

BOOL nodes[] = {
    /*  0 */ 1,
    /*  1 */ 0,
    /*  2 */ 0,
    /*  3 */ 1,
    /*  4 */ 1,
    /*  5 */ 1,
    /*  6 */ 1,
    /*  7 */ 0,
    /*  8 */ 1,
    /*  9 */ 0,
    /* 10 */ 1,
    [...]
}

…and the transistors:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {357, 558, 217},
    {1608, 657, 349},
    {412, 558, 1146},
    {558, 558, 943},
    {826, 230, 657},
    [...]
}

A quick estimate would be 1 bit per node (pullup or neutral), and 3 times 11 bits (3 node numbers from 1 to 1725) per transistor, resulting in about 117 kilobits; but there is still a lot of redundancy in this representation.

The transistor tuples can be represented in a much more compact way. First, there is no ordering for the transistors, so if we sort them by gate, we can encode just the difference between the current gate and the previous gate:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {1, 890, 558},
    {4, 558, 11},
    {5, 558, 146},
    {6, 558, 282},
    {6, 874, 657},
    {7, 657, 1591},
    {7, 657, 1591},
    {7, 657, 1591},
    {8, 150, 558},
    [...]
}

In practice, the difference between two gate nodes in the transistor list is a number between 0 and 4, so we could use 3 bits (instead of 11) for the gate. But in fact, it is enough to store a single bit: The nodes have no order either, so we can renumber them so that all nodes that are connected to a gate of a transistor are numbered from 0, and all nodes that are not connected to gates will be numbered above the ones with gates:

struct {
	int gate;
	int c1;
	int c2;
} transistors[] = {
    {0, 890, 558},
    {1, 558, 11},
    {2, 558, 146},
    {3, 558, 282},
    {3, 874, 657},
    {4, 657, 1591},
    {4, 657, 1591},
    {4, 657, 1591},
    {5, 150, 558},
    [...]
}

This way, there are no holes in the list of gates – a new transistor has either the next node number as its gate, or the same one again (i.e. a node is gate of several transistors). It is enough to store a single bit for the gate.

The example above already shows that the nodes 657 and 558 show up a lot: These are VCC and GND, respectively. In fact, 234 transistors connect to VCC, and 2493 connect to GND. We could Huffman-encode the c1 and c2 nodes, but in practice, all other nodes except these two are relatively uniformly distributed, so let’s just special case them.

There is also no ordering between the two nodes the transistor connects, and no transistor will ever connect VCC/GND to VCC/GND, so we only need to do the special VCC/GND encoding for c1, and leave c2 as it is. Let’s use a single “1″ bit to represent GND, “01″ as VCC, and all other 11 bit node numbers will be prefixed with “00″, making them occupy 13 bits.

This way, the 3510 values of c1 can be described in 2493 * 1 (the GNDs) + 234 * 2 (the VCCs) + 783 * (2 + 11) (the others) bits = 13140 bits. The c2 values occupy 3510 * 11 bits = 38610 bits. If you add the bit per transistor for the gate (i.e. 3510 bits), we’re at 55260 bits for the transistor description.

Using arithmetic coding, we can describe a node number in about 10.75 bits instead of 11. This way, we save (783 + 3510) * 0.25 bits = 1073 bits. The transistor description thus fits into 54187 bits.

As stated previously, we also need one bit per node for the pullup information. About half the nodes are pullup, and we don’t have the freedom of sorting and renaming the nodes any more (we’ve done that already for the transistor gates), so we’ll have to go with the 1725 raw bits.

So the final number is 54187 bits + 1725 bits = 55912 bits. That’s 6989 bytes, or roughly 56 kilobits, and about twice as big as H1N1.

With this number, we can estimate the complexity of a Commodore 64 to be about 200 kilobits (CPU, VIC, SID, 2xCIA, PLA) – plus another 160 kilobits of ROM.

But wait: Did you see the duplicate transistors in the sorted list? Yes, there are some duplicates, in fact, there are only 3288 unique transistors. And when looking at it on a higher level, the 130 entry instruction decode ROM has duplicates too: Only 115 are unique, leading to redundant paths in the node/transistor graph. The reason for this redundancy is routing: There are only two dimensions in a computer chip (plus layers), so you cannot just connect anything to anything.

The homework for today, dear reader, is to develop an algorithm that minimizes the description of the 6502 by removing duplicate transistors as well as equivalent paths.

CPUID on all CPUs (HOWNOTTO)

A while ago, an engineer from a respectable company for low-level solutions (no names without necessity!) claimed that a certain company’s new 4-way SMP system had broken CPUs or at least broken firmware that didn’t set up some CPU features correctly: While on the older 2-way system, all CPUs returned the same features (using CPUID), on the 4-way system, two of the CPUs would return bogus data.

I asked for his test code. It ran in kernel mode and looked roughly like this:

    int cpu_features[4];

    for (int i = 0; i < 100000; i++) {
        cpu_features[get_cur_cpu_number()] = cpuid_get_features();
        usleep(100);
    }

    for (int j = 0; j < 4; j++)
        printf("CPU %d features: %xn", j, cpu_features[j]);

Questions to the reader:

  1. What was the original idea, what is the algorithm?
  2. Why did this work on a 2-way system, but not on a 4-way system?
  3. Which two changes would at least make this code correct (albeit still horrible)?
  4. How would you do it correctly?
  5. Would you buy software from this company?

How much change is in a vending machine?

There is only one way to find out – all you need is a giant pile of money and a vending machine that sells soda for $1.25: If you put in a dollar note and press the “return change” button, you will get the dollar note back directly. If you put in two dollar notes (the maximum it takes) at a time, it will give you change for the two dollars.

Our machine had $29 of change. It started out with 25¢ coins, and after a while returned a combination of different coins (25¢ & 10¢, then 10¢ & 5¢), finally switching to all 5¢ coins at the end.

In total, it had

  • 25¢ x 57 = $14.25
  • 10¢ x 114 = $11.40
  • 5¢ x 67= $3.35

After this, the machine refused to take bills, so it was either full with bills or, more likely, out of change.

But actually, it might still contain up to 95¢ of change – today’s homework is to find out how to measure the remaining change in the machine!

Microsoft vs. Standards

Here is a fun game for long car rides: One person names a respected standard implemented by dozens of IT companies, and the other person names Microsoft’s competing technology. Example: MPEG Audio (MP3/AAC) – Windows Media Audio.

(When you have run out of examples, you can try this game with other major players in the IT business.)

Let’s play this game in the comments to this blog entry: Just add as many of these pairs that you can think of – extra points if Microsoft’s technology has a closed specification.

Aggressive Tail Call Optimization

In some i386/x86_64 assembly code my coworker was working on, there was a macro like this:

#define ENDFUNC leave 
                ret

Having forgotten about the exact implementation of the macro, he then wrote a function that ended like this:

        leave
        ENDFUNC

Now this obviously ended the function with

        leave
        leave
        ret

and he got very interesting results. What was happening?