Archive for June, 2007

Game Development Archeology: Zelda on Game Boy comes with source

Friday, June 29th, 2007

Imagine you’re writing a Game Boy game, and the resulting ROM with all the code and data is just a little over one megabyte in size. No big deal, just pad the game to two megabytes, and use a 2 MB ROM in the cartridge. Just tell the linker to allocate 2 MB or RAM, put the actual data at the beginning, and then write a 2 MB “.gb” image to disk, which will then be sent to the ROM chip factory.

Now imagine you’re doing this in MS-DOS. Your linker, probably written in C, calls malloc() of the runtime library of the C compiler. You already know where this is going?

While modern operating systems will always clear all malloc()ed memory, so that you cannot get to other processes’ data, this was uncommon in the single-user MS-DOS days. If you allocate 2 MB of RAM (the linker must have used a DOS extender or XMS), you’d get memory with random data in it: leftovers from whatever was in this memory before. (seppel tells me that this can also be caused by seek()ing over EOF in MS-DOS, in which case the previous data on the hard disk will be in the image.)

This is what happened with the 1998 Game Boy/Game Boy Color game “The Legend of Zelda – Link’s Awakening DX” (MD5: ee0424cf1523f67c5007566aed70696d). If you look at the image starting at 0×106000, you will find all kinds of interesting data, which will tell you a lot about the game’s development. Let’s call this “game development archeology”…

The ROM image includes big chunks of Borland’s Turbo C IDE (Turbo Vision interface) for DOS, as well as traces of the “QBasic” MS-DOS Editor. It is unclear which editor they used for what, but they might have used Turbo C to write DOS code to support building, as there is a complete copy of this C program in the ROM:

#include
#include

int main(void) {
	FILE *fp,*f1;
	int a=0xcd;
	int b=0xc6;
	int c=0x29;
	int ch;
	unsigned long i=0;

	if((fp=fopen("zeldag.gb","rb"))==NULL) {
		printf("can't open the file");
		return 0;
	}

	if((f1=fopen("ttmp.asm","wt"))==NULL) {
		printf("can't new file ttmp.asm");
		return 0;
	}

	while((ch=fgetc(fp))!=EOF) {
		if(a==ch) {
			i++;
			ch=fgetc(fp);
			if(b==ch) {
				i++;
				ch=fgetc(fp);
				if(c==ch)
					fprintf(f1,"%lXH, " , i);
			}
		}
		i++;
	}

	fclose(fp);
	fclose(f1);
}

This writes the file offsets at which 0xcd,0xc6,0×29 was found in the ROM image into ttmp.asm. These bytes, interpreted as Z80 machine code, mean “CALL 0x29C6″. In the final ROM image, this sequence appears once, at 0x442B. If you have any idea why they look for this, please post it in the comments.

This is the list of files in their project directory at D:\GAMEBOY:
BANK37.ASM
CLEARKU.ASM
DAMA1.ASM
DAMA2.ASM
END.CPP
ENDEND.ASM
ENDEND.LST
ENDEND1.ASM
FEND1.ASM
FEND2.ASM
FEND3.ASM
FIND.ASM
FIND.CPP
IFCHAENL.ASM
INTWCHA.ASM
TEST.ASM
TTMP.ASM
TTMP.TXT
ZIPUTP.CPP

These filenames also appear in the ROM:
ADDPLAG.ASM
ADDPLAGF.ASM
CH64TBL.ASM
FEND.ASM
G.ASM
H.ASM
INSERTKU.ASM
INTWIN.ASM
KKKKKK.ASM
L.ASM
NOPLAY1.ASM
TAB.ASM
Y.ASM
ZXHPDM.ASM

And here comes the interesting part: There is actually some assembly source in the ROM; here is a small snippet:

JoyPort_1:
                 AND $02 ;LEFT
                 JR  NZ, JoyPort_2
                 CALL LEFTScroll
                 RET
JoyPort_2:
                 AND $04 ;UP
                 JR  NZ, JoyPort_3
                 CALL UPScroll
                 RET
JoyPort_3:
                 AND $08 ;DOWN
                 JR  NZ, JoyPort_4
                 CALL DOWNScroll

Well-documented, it seems. But there is also some assembly code that looks like this:

L_B000_28F7:
                LD A,$7F
                LD BC,$0800
                LD D,A
                LD HL,$9800
L_B000_2900:
                LD A,D
                LD (HLI),A
                DEC BC
                LD  A,B
                OR  A,C
                JR  NZ,L_B000_2900
                RET
L_B000_2914:
                LD  A,(HLI)
                LD  (DE),A
                INC DE
                DEC BC
                LD  A,B
                OR  A,C
                JR  NZ,L_B000_2914
                RET

The label names suggest that this code has been disassembled from existing Z80 machine code. Link’s Awakening DX is a color remake of an older Game Boy game, so it might very well be that they lost the original source, disassembled the old code and used it again for the remake. This could be easily proven by disassembling the original version and looking for this code.

If you want to do game development archeology yourself, you might want to look at titles like “X-Men – Wolverine’s Rage” (MD5: b1729716baaea01d4baa795db31800b0), which contains Windows 9x registry keys and INF files, “Mortal Kombat 4″ (MD5: 7311f937a542baadf113e9115158cde3), in which you can find some small source fragments, “Gift” (MD5: e6a51088c8fea7980649064bd3a9f9ff), which will tell you that the developers had some Game Boy emulators installed on their system, or the “BIT-MANAGERS” games “Spirou” (MD5:5aa012cf540a5267d6adea6659764441, Turbo C, MAP file, source) and “TinTin in Tibet” (Game Boy Color version, MD5: 8150a3978211939d367f48ffcd49f979), which, amongst other things, contains references to Nintendo’s Game Boy Advance (!) SDK (“C:\Cygnus\thumbelf-000512\H-i686-cygwin32\lib\gcc-lib\thumb-elf\2.9-arm-000512″, “/tantor/build/nintendo/arm-000512/i686-cygwin32/src/newlib/libc/stdio/stdio.c”).

If you find any more things like these, please post them (or links to your stories) as comments! Happy hacking!