Game Development Archeology: Zelda on Game Boy comes with source

Imagine you’re writing a Game Boy game, and the resulting ROM with all the code and data is just a little over one megabyte in size. No big deal, just pad the game to two megabytes, and use a 2 MB ROM in the cartridge. Just tell the linker to allocate 2 MB or RAM, put the actual data at the beginning, and then write a 2 MB “.gb” image to disk, which will then be sent to the ROM chip factory.

Now imagine you’re doing this in MS-DOS. Your linker, probably written in C, calls malloc() of the runtime library of the C compiler. You already know where this is going?

While modern operating systems will always clear all malloc()ed memory, so that you cannot get to other processes’ data, this was uncommon in the single-user MS-DOS days. If you allocate 2 MB of RAM (the linker must have used a DOS extender or XMS), you’d get memory with random data in it: leftovers from whatever was in this memory before. (seppel tells me that this can also be caused by seek()ing over EOF in MS-DOS, in which case the previous data on the hard disk will be in the image.)

This is what happened with the 1998 Game Boy/Game Boy Color game “The Legend of Zelda – Link’s Awakening DX” (MD5: ee0424cf1523f67c5007566aed70696d). If you look at the image starting at 0x106000, you will find all kinds of interesting data, which will tell you a lot about the game’s development. Let’s call this “game development archeology”…

The ROM image includes big chunks of Borland’s Turbo C IDE (Turbo Vision interface) for DOS, as well as traces of the “QBasic” MS-DOS Editor. It is unclear which editor they used for what, but they might have used Turbo C to write DOS code to support building, as there is a complete copy of this C program in the ROM:

#include
#include

int main(void) {
	FILE *fp,*f1;
	int a=0xcd;
	int b=0xc6;
	int c=0x29;
	int ch;
	unsigned long i=0;

	if((fp=fopen("zeldag.gb","rb"))==NULL) {
		printf("can't open the file");
		return 0;
	}

	if((f1=fopen("ttmp.asm","wt"))==NULL) {
		printf("can't new file ttmp.asm");
		return 0;
	}

	while((ch=fgetc(fp))!=EOF) {
		if(a==ch) {
			i++;
			ch=fgetc(fp);
			if(b==ch) {
				i++;
				ch=fgetc(fp);
				if(c==ch)
					fprintf(f1,"%lXH, " , i);
			}
		}
		i++;
	}

	fclose(fp);
	fclose(f1);
}

This writes the file offsets at which 0xcd,0xc6,0x29 was found in the ROM image into ttmp.asm. These bytes, interpreted as Z80 machine code, mean “CALL 0x29C6”. In the final ROM image, this sequence appears once, at 0x442B. If you have any idea why they look for this, please post it in the comments.

This is the list of files in their project directory at D:GAMEBOY:
BANK37.ASM
CLEARKU.ASM
DAMA1.ASM
DAMA2.ASM
END.CPP
ENDEND.ASM
ENDEND.LST
ENDEND1.ASM
FEND1.ASM
FEND2.ASM
FEND3.ASM
FIND.ASM
FIND.CPP
IFCHAENL.ASM
INTWCHA.ASM
TEST.ASM
TTMP.ASM
TTMP.TXT
ZIPUTP.CPP

These filenames also appear in the ROM:
ADDPLAG.ASM
ADDPLAGF.ASM
CH64TBL.ASM
FEND.ASM
G.ASM
H.ASM
INSERTKU.ASM
INTWIN.ASM
KKKKKK.ASM
L.ASM
NOPLAY1.ASM
TAB.ASM
Y.ASM
ZXHPDM.ASM

And here comes the interesting part: There is actually some assembly source in the ROM; here is a small snippet:

JoyPort_1:
                 AND $02 ;LEFT
                 JR  NZ, JoyPort_2
                 CALL LEFTScroll
                 RET
JoyPort_2:
                 AND $04 ;UP
                 JR  NZ, JoyPort_3
                 CALL UPScroll
                 RET
JoyPort_3:
                 AND $08 ;DOWN
                 JR  NZ, JoyPort_4
                 CALL DOWNScroll

Well-documented, it seems. But there is also some assembly code that looks like this:

L_B000_28F7:
                LD A,$7F
                LD BC,$0800
                LD D,A
                LD HL,$9800
L_B000_2900:
                LD A,D
                LD (HLI),A
                DEC BC
                LD  A,B
                OR  A,C
                JR  NZ,L_B000_2900
                RET
L_B000_2914:
                LD  A,(HLI)
                LD  (DE),A
                INC DE
                DEC BC
                LD  A,B
                OR  A,C
                JR  NZ,L_B000_2914
                RET

The label names suggest that this code has been disassembled from existing Z80 machine code. Link’s Awakening DX is a color remake of an older Game Boy game, so it might very well be that they lost the original source, disassembled the old code and used it again for the remake. This could be easily proven by disassembling the original version and looking for this code.

If you want to do game development archeology yourself, you might want to look at titles like “X-Men – Wolverine’s Rage” (MD5: b1729716baaea01d4baa795db31800b0), which contains Windows 9x registry keys and INF files, “Mortal Kombat 4” (MD5: 7311f937a542baadf113e9115158cde3), in which you can find some small source fragments, “Gift” (MD5: e6a51088c8fea7980649064bd3a9f9ff), which will tell you that the developers had some Game Boy emulators installed on their system, or the “BIT-MANAGERS” games “Spirou” (MD5:5aa012cf540a5267d6adea6659764441, Turbo C, MAP file, source) and “TinTin in Tibet” (Game Boy Color version, MD5: 8150a3978211939d367f48ffcd49f979), which, amongst other things, contains references to Nintendo’s Game Boy Advance (!) SDK (“C:Cygnusthumbelf-000512H-i686-cygwin32libgcc-libthumb-elf2.9-arm-000512”, “/tantor/build/nintendo/arm-000512/i686-cygwin32/src/newlib/libc/stdio/stdio.c”).

If you find any more things like these, please post them (or links to your stories) as comments! Happy hacking!

68 thoughts on “Game Development Archeology: Zelda on Game Boy comes with source”

  1. are games not signed or scrambled with a secret chipper ? wouldn’t it be possible that the chipper algo for signing the game including the keys are in the ram too ?

  2. @hamtitampti: The 1989 Game Boy didn’t include any security mechanisms other than the requirement for the ROM to include a (copyrighted) Nintendo logo, which was checked by a 256 byte bootloader internal to the CPU. ROM encryption would have had to be handled by hardware, since Game Boys had too little RAM (8 KB) to decode the game code and data into it – instead, they typically ran the code directly from ROM.

  3. Would be interesting to write a script to grep through a database of ROM files looking for such hidden treasures. Actually, in general, a script that finds source code in an arbitrary set of files would be useful to have around.

  4. At the fresh young age of 14, I discovered the VIC20 cartridge game ‘Mole Attack’ had chucks of its full assembler listing preserved in the ROM! …. there began a journey to understand this new computer language…

  5. In other important news, I found a torn scrap of paper with writing on it today. It says:

    bread
    milk
    pick up dry clea

  6. You should check out the arcade ROM for Golden Axe 2, if you can. They did a similar thing there, where they accidentally included a fairly large chunk of the assembler source code.

  7. The GoodTools name for the file with the MD5 sum ee0424cf1523f67c5007566aed70696d is “Zelda no Densetsu – Yume no Miru Shima DX (J) (V1.0) [C][p1][T+Chi][!].gbc” – if you are hunting for code left there by Nintendo, you should use clean dumps of the game. None of the source code is present in the clean dumps of the game.

    b1729716baaea01d4baa795db31800b0 X-Men – Wolverine’s Rage (U) [C][!].gbc – clean dump
    7311f937a542baadf113e9115158cde3 Mortal Kombat 4 (E) [C][!].gbc – clean dump
    e6a51088c8fea7980649064bd3a9f9ff Gift (E) [C][t2].gbc – *t*rained, but the clean dump also has those in it…
    5aa012cf540a5267d6adea6659764441 Spirou (U) (M4) [S].gb – no indication if this is a clean dump
    8150a3978211939d367f48ffcd49f979 Tintin in Tibet (E) (M7) [C][!].gbc – clean dump

    But nice finds nonetheless ;)

  8. doesn’t have to be disassembled, some c compilers first create asm and then binary or they wrote some stuff in c for beeing quick, turned it into asm and put it into their existing asm code.

  9. what’s the big deal with playboy? they were supposed to make a game for the gb. what happened to it?

  10. Guys, this stuff isn’t even in the clean ROM. It’s junk left in there by whoever dumped/hacked it. (What is in there, though, is a lot of the game’s dialogue repeated several times in both English and French. O_o)

    There have been games that accidentally left bits of their source code in the ROM, and with more modern systems it’s not uncommon to see lists of filenames intended to be shown on error report screens, but this game isn’t one of them.

  11. Well, considering what Somebody said up there, these things are much more likely to be the contents of memory from the people who dumped the ROMs, not from those who created the data in the first place. A lot of the stuff you found would make much more sense as the contents of a reverse engineerer’s memory.

  12. @ willie lo

    Too busy on other pursuits I expect. And of course, I’ve never found hef to be much of a coder. Takes him ages to get anything done these days.

  13. Modern operating systems do not clear malloc()ed data before returning it. However multiprocessing operating systems should (and almost all do now) only reuse memory from the same process (malloc is a library call — normally there is another way to actually get more memory for the process).

    If you want that, you can use calloc() (clear-alloc), or do a memset() on the returned memory.

  14. If you look in the ROM for Robin Hood: Prince of Thieves for NES, there is some assembly source code padded in there also.

  15. It would be nice to gather all this info on a webpage. Maybe a wiki where people could post their findings and explore them collaboratively. This could lead to lots of interesting studies.

  16. “While modern operating systems will always clear all malloc()ed memory, so that you cannot get to other processes’ data [..]”

    Actually that isn’t true. Most UNIXes don’t clear allocated memory. Thus the reason why if you dig through OpenSSH or any other application with private data they always clear it before releasing it back into the pool.

    Some UNIXes provide /etc/malloc.conf which let you set the clear on return behavior, but that does have an impact on performance.

    – Ben

  17. Linux, at least, will zero pages before mapping them into a process’s address space, but malloc() will not clear them again, so it’s possible to get a chunk of data from the same process.

    Note that the clearing happens when the page is mapped back in, not when it’s unmapped, so groveling through /proc/kcore could still get such data, which may be why openssh etc will clear the pages in question.

  18. The PS2 game Ico has the .s file created by the compiler left on the production disc. Surprising it got through Sony’s TRC actually. Includes all the VU and EE code with fairly clear labels too.

  19. Going back to the C64 days, loading & resetting ‘Thing on a Spring’ allowed you to perform SYS 49152 to access a complete disassembler. Similarly, a reset of Rsmbo followed by hitting the restore kry entered the music editor.

  20. Too bad the guys developing “TinTin in Tibet” were almost certainly only using cygwin as a platform for cross-compilation; if we could catch them on a GPL violation they’d have to release the full source code.

    Still interesting that they were using an open-source platform for their development.

  21. I remember something similar in an old amiga game that was called Ballistix. Some portions of the source code could be found in certain unused sectors of the disk.

  22. “nights into dreams” on the dreamcast had a special GDROM that included wallpaer images, also some other dreamcast games did the same.

  23. I remember I found a FULL Basic compiler at the end of the game Moon Alert for ZX Spectrum :)

  24. My friend and I also found a compiler at the end of “Way Of The Exploding Fist” whilst persuading the game to run from a Microdrive cartridge, got that one printed in Microdrive Exchange.

  25. It’d be cool to collect a lot of these bits of code and document them. I’d love to buy a book that had major chunks of code from old games with information on their design and stuff.

  26. Sorry for the stupid question, but what program do you use to view the contents of a ROM file? Hex editor or something? If there is some kind of code inside, will it jump out at you?

  27. this game is awsooooooooooooommmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmmme

  28. This little sequence was string dumped From an old ZX Spectrum game commissioned by Puffin books (The Warlock of Firetop Mountain), a good sum up of development back then….

    PS we apologise for the game being boring
    but we were literaly only given 3 weeks to
    write it .
    PPS Return of the Things !
    coming soon – the follow up to Halls
    The programme about which will be said :-
    “Oh no not again” ? Sinclair User
    “I’ll get them for this” GOD
    and finally
    “..sod 30% what about a wench ?..” The Authors
    ” one thats nice ”
    ” not too nice ”
    ” with huge …. tracts of land ”
    ” and not too expensive ”
    PSSP See you in the next game !
    Try not to take life seriously or you’ll
    go as far as
    have …..
    Virgin,22 years old,lonely,suicidal and mad .
    (interesting order they’re in huh ?)
    PSSPS It’s goodnight from me
    and It’s goodnight from another unoriginal joke
    (c) S.Brattel and the one and only N.Mottershead

  29. Heh – you should see all the stuff that was left over in PC-Engine CD games (TurboGrafx-16 in the US) – in the data tracks. Sometimes it’s left over WAVE data for audio tracks of other games, or leftover bits of source code, tools, and comments ;) I believe one game had either the full or almost full intact file of the MSDOS Japanese command.com

    The situation appears to be similar to the Zelda incident. They used an HD for a CD emulator instead of rom boards. There is one game that appears to have tons of personal stuff, files and info from one of the developers – stored in LHA files I believe. Other than that, I don’t think I should be giving out the name of the arcade card game…

  30. We found this right after the game came out, but didn’t look too deeply into it, figuring it was just an error in their linking setup.

    Writing a small program to scan through a ROM for a certain CALL may have been done because Links Awakening DX is a CGB re-write of the original DMG Links Awakening. I think they may have not had all of the source code to the original version and had to hack it apart to make the updated version.

    That CALL they’re looking for may update screen data, which on the CGB would have to be modified to include updating color palettes after the image data. Aside from some minor additional features added to the game, it was essentially the same.

    Try running that C program on the original Link’s Awakening for DMG and see what listing it gets you. :) (fires up the hex editor to look)

  31. The Famicom game, Air Fortress, also has a bunch (LOTS) of assembly code easily readable if you open the ROM with a hex or text editor.

  32. Now this “game archaeology” thing is quite interesting.

    Some time ago I dumped the contents of the ROM file of the MSX game “Hydlide” in a hex editor, and I’ve found some interesting obscure writtings.

    If you want to know what I found, get an MSX Hydlide ROM, open it in an hex editor and try it yourself :D

  33. That CALL they?re looking for may update screen data, which on the CGB would have to be modified to include updating color palettes after the image data. Aside from some minor additional features added to the game, it was essentially the same.

Leave a Reply to JoseQ Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.