Category Archives: archeology

Reconstructing Some Source of Microsoft BASIC for 8080

Microsoft BASIC for 6502 exists digitally in source form – the older version of the Intel 8080 CPU only exists on paper though: as a printout in the archives of Harvard University. Some snippets of the code are public though:

  • Ian Griffiths held the printout in his hands and took notes. He copied down several lines from the first page.
  • In Harry Lewis’s blog post, he shows a picture of some lines of the reproduction that is display on the the wall of the ground floor lounge of the Maxwell Dworkin building at Harvard.
  • David J. Malan has a collection of photos of the Maxwell Dworkin reproductions online.

The Computer History Museum in Mountain View, California has a video on display in the software section that tells the story of the company Microsoft. In this video, they show the first page of 8080 BASIC:


Together with the other two sources, we can reconstruct the start of the first page:

00100	MCSSIM(STARI)
00120
00140	TITLE	BASIC MCS 8080	GATES/ALLEN/DAVIDOFF
00160	IFNDEF	LENGTH,<PRINTX !!! MUST HAVE COM !!
00180		END>
00200	IF1;<
00220	IFE	LENGTH,<PRINTX /SMALL/ >
00240	IFE	LENGTH-1;<PRINTX /MEDIUM/ >
00260	IFE	LENGTH-2;<PRINTX /BIG/ >
00280	IFE	STRING,<PRINTX /NO $$/ >
00300	IFN	STRING,<PRINTX /$$ $$/ >
00320	>
00340	SUBTTL	VERSION 1.1 -- MORE FEATURES TO COME
00360	COMMENT *
00380
00400	-------------------------------------------
00420	COPYRIGHT 1975 BY BILL GATES AND PAUL ALLEN
00440	-------------------------------------------
00460
00480
00500	WRITTEN ORIGINALLY ON THE PDP-10 AT HARVARD FROM
00520	FEBRUARY 9 TO APRIL 27
00540
00560	PAUL ALLEN WROTE THE NON-RUNTIME STUFF.
00580	BILL GATES WROTE THE RUNTIME STUFF.
00600	MONTE DAVIDOFF WROTE THE MATH PACKAGE.
00620
00640	THINGS TO DO:
00641	SYNTAX PROBLEMS (OR)
00642	NICE ERRORS
00643	ALLOW ^W AND ^C IN LIST COMMAND
00646	TAPE I/O
00648	BUFFER I/O
00650	USR ??
00652	ELSE
00660	USER-DEFINED FUNCTIONS(MULTI-ARG,MULTI-LINE,STRINGS)
00680	MAKE STACK BOUNDARY STUFF EXACT
	(FOUT 24 FIN 14)
	PUNCH, DELETE;,.
	INLINE CONSTANT CONVERSION -- MAKE IT WORK
	SIMPLE STRINGS

While this is nice, it would be much nicer to have more, maybe all of the original source.

Can someone take high resolution photos of the first eight pages on display on the the wall of the ground floor lounge of the Maxwell Dworkin building at Harvard?

Can someone make a copy of the printout at the Harvard University archives? Applications like Microsoft Office Lens can make high quality copies of printouts with a phone.

Making Obsolete Code Run Again: The mxass 6502 Cross Assembler

Here’s the challenge: Take code that you wrote some 20 years ago in an obsolete programming language for an obsolete platform, make it run on a modern system (without emulation!)… and actually make it useful!

In 1995, I started developing a 6502 Cross Assembler for MS-DOS, in my then favorite languages: PowerBASIC for the bulk of it, and lots of 8086 inline assembly to speed up string operations. I mostly used it for my own C64 projects, and I was very proud of its speed: A fraction of a second to produce several KB of binary code on a 386.

In September 1996, I decided that Turbo Pascal was actually the better language, and converted the source line by line, but keeping all the inline assembly. Development continued until June 1998.

In January 2008, I rediscovered the source and wanted to see whether it could be ported to run on modern computers. I used p2c to convert the Pascal source into C, and spend two days cleaning up the C and rewriting the assembly in C until it correctly compiled my regression test – but the code was still using Pascal strings, for example.

Recently, I dug into my floppy disk collection to recover as many revisions of the source as possible, converted the whole (surviving) history into a git repository, and put it on

github.com/mist64/mxass.

The C version of mxass should run on any modern operating system, and it’s actually a useful piece of software, with some unique features. It supports 6502 with illegal opcodes, 65816 and Z80 assembly, and it tries to be backwards compatible with the “64′er” set of assemblers (Hypra-Ass, Giga-Ass, Vis-Ass, Assblaster, F8-Assblaster), in fact, F8-Assblaster source printed to a file in an emulator should assemble with very few changes.

That said, please do not use it for new projects. Use cc65 for that.

What is the oldest source you have written for an obsolete platform that have ported forward to modern systems?

Emulating the Intel 8080 on a MOS 6502

Emulating older computers on modern, much faster systems, is very common nowadays – but how about emulating the Intel 8080 (1974) on a MOS 6502 system like the KIM-1 (1975)? The “8080 Simulator for the 6502″ by Dann McCreary from 1978 does exactly that.

Why imitate one microprocessor with another? You probably purchased this 8080 simulator package to do one or more of the following:

  • Run existing 8080 software on your 6502
  • Write, test and debug your own 8080 software without having to purchase a complete 8080 based system
  • Learn something about the architecture and instruction set of the 8080 via hands-on experience

The emulator is extremely size-optimized and fits in less than 1 KB of RAM. This was done by compressing the 256-entry opcode space into 25 sections of similar instructions that could be handled by one generic function.

The four-page article “8080 Simulation with a 6502″ (MICRO – The 6502 Journal, issue 16, September 1979) explains the motivation and design of the software in detail:


Dann McCreary: 8080 Simulation with a 6502 [1979]

And here is the original commented source code with usage instructions:


Dann McCreary: An 8080 Simulator for the 6502, KIM-1 Version [1978]

Thanks a lot to Dann McCreary, who provided scans of his original work, as well as additional insights:

I wrote this by hand, pencil and paper assembly (BTW, did you ever read Carl Helmer’s article about pencil and paper in one of the very early issues of BYTE magazine? ;) and much of the simulator was written as I rode the bus to and from work… ;)

About the tools used to create this program:

Honestly, I don’t remember for certain… BUT… MORE THAN LIKELY, it is
a FAKE – i.e., I probably just text-edited a listing in “assembler
listing” format for the purpose of publishing the code and “looking
professional”… ;)

CONSEQUENTLY, be on the lookout for typographical (and thus operational)
errors that could be “in there” …

The all-upper-case makes me think I printed this stuff out on an old
drum printer that I resurrected from the American Surplus Computer
company in Boston back in the ’70s… And there’s yet another story! :)

The text editor may likely have been running on my Apple ][…. or maybe
“borrowed” from some company where I was working at the time?

Apple-80

Dann later ported the simulator to the Apple-][. The tape dump of the resulting product “Apple-80″ can be found at brutaldeluxe.fr.

Converting VisAss and F8 AssBlaster Source

If you have developed applications for the Commodore 64 in the 80s or 90s, chances are you still have your old floppy disk with the original assembly sources. If you have used the VisAss or
F8 AssBlaster assemblers, you can use a new command line tool I wrote to convert the encoded binary files into ASCII, so they can be published or you can continue development using modern tools like cc65.

Get it at github.com/mist64/vis2ascii.

If you need any help converting old C64 source in other formats, contact me, I’ll be happy to help!

Reverse-Engineered Final Cartridge III Source Code

The Final Cartridge III was one of the major multi-function extension cartridges for the Commodore 64. It contained BASIC extensions, floppy and tape speeders, centronics printer support, screen editor extensions including F-key shortcuts, a monitor, a freezer – and a GEOS-like windowing system called “Desktop”. In all this, the FC3 integrated seamlessly with the look-and-feel of the stock Commodore 64: It did not change anything (same screen colors and banner!), it only extended functionality in consistent ways.

The 64 KB ROM in the FC3 consists of 4 banks. I reverse engineered and commented the first bank, which contains the BASIC and editor extensions, the floppy and tape speeder, fast format, the centronics printer driver, and the monitor. The cc65 suite can compile the set of .s files into the exact original binary.

The project can be found at github.com/mist64/final_cartridge.

The source is useful in many ways. Since the components are quite independent of each other, they can be extracted and used in other projects. The build system is configured so it can build a stand-alone RAM version of the monitor, for example. Another idea would be to use the source as a base for an improved version of the Final Cartridge. Components that are obsolete nowadays can be swapped with more useful functionality, for example. Refer to the README in the repository for more information.

Contributions welcome!

Source of “The Wave” Web Browser for C64/C128 GEOS Wheels

“The Wave” is a Web Browser for GEOS (with the Wheels extension) on C64/C128 machines with a SuperCPU and a RAM extension.

A while ago, Maurice Randall, the author, published the source of the V1.0 version as a WR3 archive that contained the assembly source in GeoWrite format and some developer documentation.

I converted the files into plain text and published them at github.com/mist64/thewave.

geowrite2rtf, a GeoWrite to RTF converter

geowrite2rtf is a tool that converts Commodore 64/128 GEOS GeoWrite documents into RTF format. Most formatting will be preserved, but some formatting and graphics will be discarded.

The C64/C128 GEOS GeoWrite file format is a rich text format, which is, like most GEOS file formats, not a sequential file, but a VLIR file, a collection of up to 127 sequential streams. For a VLIR file to exist outside of a C64 disk image (D64, D71, D81 etc.), it has to be in serialized .CVT format. You can use tools like c1541 or DirMaster to extract them from disk images.

You can find it at github.com/mist64/geowrite2rtf.

If you are looking for GeoWrite files to test this on, here is the gateWay OS manual, or search for “CVT” at cbm8bit.com

Comparative C64 ROM Disassembly Study Guide

The Commodore 64 ROM has been subject to immense reverse engineering. Many commented disassemblies were published over the decades, scattered over different media such as books, magazines, disks, and later, the internet – and there are even some commentaries that apply to the C64 ROM, but were written with other systems in mind that shared Microsoft’s BASIC interpreter.

In the past weeks, I have collected and published several of these comentaries in a unified format:

Wouldn’t it be nice to see the comments of all these sources at the same time when looking up code in the C64 ROM?

At pagetable.com/c64rom, you can now see a cross-referenced HTML of the disassembled C64 ROM, with four commentaries side-by-side – the Comparative C64 ROM Disassembly Study Guide:

If you can’t fit all columns on your screen, try reducing the text size in your browser.

The raw txt files with the commentaries as well as the script to combine them are maintained at github.com/mist64/c64rom. Improvements welcome.

And, as mentioned previously, there are many more commentaries in existence, if you want to help me convert them into the canonical format, send me an email.

Fully Commented Commodore 64 BASIC ROM Disassembly – based on Microsoft’s Source

On my quest of collecting as many commentaries on the Commodore 64 ROM at pagetable.com/c64rom, we have gathered Lee Davison’s excellent commentary, the German de facto standard by Data Becker, and an adaptation of Bob Sander-Cederlof’s Apple II ROM commentary, all in the same cross-referenced HTML format.

Now that Microsoft’s original source of MOS 6502 BASIC is available, I’ve added it as the fourth commented disassembly, with the standard disassembly on the left, and the original source, both assembly and comments, lined up correctly on the right:

As always, the HTML version is available at pagetable.com/c64rom, while the raw txt files are maintained at github.com/mist64/c64rom.

While this may be the best set of comments for the BASIC part, we’re not done yet! There are many more (either direct or indirect) commentaries on the C64 ROM in existence:

Please contribute to our collection by helping convert one or more of these sources into the common format! Send me an email if you are interested!

Microsoft BASIC for 6502 Original Source Code [1978]

This is the original 1978 source code of Microsoft BASIC for 6502 with all original comments, documentation and easter eggs:

M6502.MAC (1978-07-27, 6955 lines, 161,685 bytes)

This is currently the oldest publicly available piece of source written by Bill Gates.

Language

Like the 8080 version, the 6502 version was developed on a PDP-10, using the MACRO-10 assembler. A set of macros developed by Paul Allen allowed MACRO-10 to understand and translate 6502 assembly, albeit in a modified format to fit the syntax of macros, for example:

MOS 6502 MACRO-10
LDA #0 LDAI 0
LDA (ADDR),Y LDADY ADDR

MACRO-10 did not support hex numbers, which is why most numbers are in decimal format. In the floating point code, all numbers are octal. The RADIX statement switches between the two. Octal can also be forced with a ^O prefix.

Conditional translation is done using the IFE and IFN statements, which test whether the argument is zero. The following only adds the string to the binary if REALIO is equal to 4:

IFE     REALIO-4,<DT"APPLE BASIC V1.1">

Macros

The source defines many macros that make development easier. There are some examples:

Macro Definition Comment
SYNCHK (Q) LDAI <Q>
JSR SYNCHR
Get the next character and make sure it’s Q, otherwise SYNTAX ERROR. This pattern is used a lot.
LDWD (WD) LDA WD
LDY <WD>+1
Most 16 bit constants are loaded into A/Y with this macro, but macros for A/X and X/Y also exist.
LDWDI (WD) LDAI <<WD>&^O377>
LDYI <<WD>/^O400>
This loads an immediate constant into A/Y.
PSHWD (WD) LDA <WD>+1
PHA
LDA WD
PHA
This pushes a 16 bit value from memory (absolute or zero page) onto the stack.
JEQ (WD) BNE .+5
JMP WD
A compact way to express out-of-bounds branches. Macros exist for all branches.
SKIP2 XWD ^O1000,^O054 This emits a byte value of 0x2C (BIT absolute), which skips the next instruction. (The ^O1000 part wraps the byte in a PDP-10 instruction – see below.)

Configurations

The BASIC source supports several compile-time configuration options:

Name Comment Description
INTPRC INTEGER ARRAYS
ADDPRC FOR ADDITIONAL PRECISION 40 bit (9 digit) vs 32 bit (7 digit) float
LNGERR LONG ERROR MESSAGES Error message strings instead of two-character codes
TIME CAPABILITY TO SET AND READ A CLK TI and TI$ support
EXTIO EXTERNAL I/O PRINT#, INPUT#, CMD, SYS (!), OPEN and CLOSE support
DISKO SAVE AND LOAD COMMANDS LOAD, SAVE (and on Commodore: VERIFY) support
NULCMD FOR THE "NULL" COMMAND NULL support, a command to configure the number of NUL characters to print to the terminal after each line break
GETCMD GET support
RORSW If 1, the ROR instruction is not used
ROMSW TELLS IF THIS IS ON ROM The RAM version can optionally jetison the SIN, COS, TAN and ATN commands at startup
CLMWID Column width for TAB()
LONGI LONG INITIALIZATION SWITCH
STKEND The top of stack at startup
BUFPAG Page of the input buffer; if 0, the buffer uses parts of the zero page
BUFLEN INPUT BUFFER SIZE
LINLEN TERMINAL LINE LENGTH
ROMLOC ADDRESS OF START OF PURE SEGMENT
KIMROM KIM-specific smaller config

Targets

The constant REALIO is used to configure what computer system to generate the binary for. It has one of the following values:

Value Comment Banner Machine
0 PDP-10 SIMULATING 6502 SIMULATED BASIC FOR THE 6502 V1.1 Paul Allen’s Simulator on PDP-10
1 MOS TECH,KIM KIM BASIC V1.1 MOS KIM-1
2 OSI OSI 6502 BASIC VERSION 1.1 OSI Model 500
3 COMMODORE ### COMMODORE BASIC ### Commodore PET 2001
4 APPLE APPLE BASIC V1.1 Apple II
5 STM STM BASIC V1.1 (unreleased)

All versions except Commodore also print “COPYRIGHT 1978 MICROSOFT” in a new line.

The target defines the setting of the configuration constants, but some code is also conditionally compiled depending on a specific target.

What is interesting is that initially it was Microsoft adapting their source for the different computers, instead of giving source to the different vendors and having them adapt it. Features like file I/O and time support seem to have been specifically developed for Commodore, for example. Later, the computer companies would get the source from Microsoft and develop themselves – source code of the Apple and Commodore derivatives is available; they both contain Microsoft comments.

By the way, the numbering of these targets probably indicated in which order Microsoft signed contracts with computer manufacturers. MOS was first (for the KIM), then OSI, then Commodore/MOS again (this time for the PET), then Apple.

The PDP-10 Target

Paul Allen’s additional macros for 6502 development made the MACRO-10 assembler output one 36 bit PDP-10 instruction word for every 6502 byte. When targeting a real 6502 machine, the 6502 binary could be created by simply extracting one byte from every PDP-10 word.

In the case of targeting the simulator, the code created by the assembler could just be run without modification, since every emitted PDP-10 instruction was constructed so that it would trap – the linked-in simulator would then extract the 6502 opcode from the instruction and emulate the 6502 behavior.

While this trick was mostly abstracted by the (unreleased) macro package, its workings can be seen in a few cases in the BASIC source. Here, it defines SKIP1 and SKIP2. Instead of just emitting 0×24 or 0x2C, respectively, it combines it with the octal value of 01000 to make it a PDP-10 instruction that traps:

DEFINE  SKIP1,  <XWD ^O1000,^O044>      ;BIT ZERO PAGE TRICK.
DEFINE  SKIP2,  <XWD ^O1000,^O054>      ;BIT ABS TRICK.

In the initialization code, it writes a JMP instruction into RAM. On the simulator, it has to patch up the opcode of JMP (0x4C, decimal 76) to be the correct PDP-10 instruction:

        LDAI    76              ;JMP INSTRUCTION.
IFE     REALIO,<HRLI 1,^O1000>  ;MAKE AN INST.

With this information, we can reconstruct what the set of 6502 macros, which is not part of this source, probably looked like. Here is LDAI (LDA immediate):

DEFINE  LDAI    (Q),<
        XWD ^O1000,^O251        ;EMIT OPCODE
        XWD ^O1000,<Q>          ;EMIT OPERAND
>

You can also see native TJSR PDP-10 assembly instructions for character I/O:

IFE     REALIO,<
        TJSR    INSIM##>        ;GET A CHARACTER FROM SIMULATOR
IFE     REALIO,<
        TJSR    OUTSIM##>       ;CALL SIMULATOR OUTPUT ROUTINE

The DDT command, which breaks into the PDP-10′s DDT debugger, only exists in this config:

IFE     REALIO,<
DDT:    PLA                     ;GET RID OF NEWSTT RETURN.
        PLA
        HRRZ    14,.JBDDT##
        JRST    0(14)>

The KIM and OSI Targets

The KIM target is meant for the MOS KIM-1 and Ohio Scientific OSI Model 500 single-board computers. These are the first ports to specific computers, and also the cleanest, i.e. except for the character I/O interface and the very simple LOAD/SAVE implementation for the KIM, there is nothing specific about these targets.

The Commodore Target

The Commodore target is meant for the Commodore PET 2001. It includes LOAD/SAVE/VERIFY (the commands jump directly to outside “KERNAL” ROM code), the I/O commands (SYS, PRINT#, OPEN etc.), the GET command and the π, ST, TI and TI$ symbols. CLEAR is renamed to CLR, "OK" is renamed to "READY.", the BEL character is not printed, and character I/O code behaves differently to account for the more featureful screen editor of the PET.

Oh, and the Commodore version of course includes the Bill Gates WAIT 6502,1 easter egg! This is the WAIT instruction:

; THE WAIT LOCATION,MASK1,MASK2 STATEMENT WAITS UNTIL THE CONTENTS
; OF LOCATION IS NONZERO WHEN XORED WITH MASK2
; AND THEN ANDED WITH MASK1. IF MASK2 IS NOT PRESENT, IT
; IS ASSUMED TO BE ZERO.

FNWAIT: JSR     GETNUM
        STX     ANDMSK
        LDXI    0
        JSR     CHRGOT
        BEQ     ZSTORDO
        JSR     COMBYT          ;GET MASK2.
STORDO: STX     EORMSK
        LDYI    0
WAITER: LDADY   POKER
        EOR     EORMSK
        AND     ANDMSK
        BEQ     WAITER
ZERRTS: RTS                     ;GOT A NONZERO.

Note how the BEQ instruction references ZSTORDO, not STORDO – execution sneaks out of this function here.

Well, on non-Commodore machines, ZSTORDO is assigned to be the same as STORDO, so everything is fine:

IFN     REALIO-3,<ZSTORDO=STORDO>

But on Commodore, we have this code hidden near the top of the floating point math package – close enough so the BEQ can reach it, but inside code that is least likely to get touched:

IFE     REALIO-3,<
ZSTORD:!        LDA     POKER
        CMPI    146
        BNE     STORDO
        LDA     POKER+1
        SBCI    31
        BNE     STORDO
        STA     POKER
        TAY
        LDAI    200
        STA     POKER+1
MRCHKR: LDXI    12
IF1,<
MRCHR:  LDA     60000,X,>
IF2,<
MRCHR:  LDA     SINCON+36,X,>
        ANDI    77
        STADY   POKER
        INY
        BNE     PKINC
        INC     POKER+1
PKINC:  DEX
        BNE     MRCHR
        DEC     ANDMSK
        BNE     MRCHKR
        RTS
IF2,<PURGE ZSTORD>>

(IF1 and IF2 are true on the first and the second assembler pass, respectively, so the conditional there is to hint to the assembler in the first pass that SINCON+36 is not a zero page address. Also note that all numbers here are octal, since this code is in the floating point package.)

First of all, the final line here removes ZSTORD from the list of symbols after the second pass, so that Commodore would not notice it in a printout of all symbols – very smart!

As has been discussed before, this code writes the string “MICROSOFT!” into the PET’s screen RAM if the argument to WAIT is “6502″. The encoded string is hidden as two extra 40 bit floating point numbers appended to the coefficients used by the SIN function:

IFN     ADDPRC,<
SINCON: 5               ;DEGREE-1.
        204     ; -14.381383816
        346
        032
        055
        033
        206     ; 42.07777095
        050
        007
        373
        370
        207     ; -76.704133676
        231
        150
        211
        001
        207     ; 81.605223690
        043
        065
        337
        341
        206     ; -41.34170209
        245
        135
        347
        050
        203     ; 6.2831853070
        111
        017
        332
        242
        241     ; 7.2362932E7
        124
        106
        217
        23
        217     ; 73276.2515
        122
        103
        211
        315>

These last ten bytes, nicely disguised as octal values of floating point constants, spell out “MICROSOFT!” backwards after clearing the upper two bits. What’s interesting is that the floating point values next to them are actually incorrect: They should be 7.12278788E9 and 26913.7691 instead.

Also note that these constants are not conditionally assembled! All versions built since the Commodore easter egg was introduced also contained these 10 bytes – including BASIC for the Motorola 6800!

The Apple Target

The Apple target is meant for the Apple II, and contains no customizations other than some changes around I/O handling (which calls into the monitor ROM). Note that this is not yet the “AppleSoft” version of BASIC, which was a more customized version modified by Apple later.

The STM Target

“STM” most likely stands for “Semi-Tech Microelectronics” – a company that never shipped a 6502-based computer. Their first machine was the “Pied Piper”, a Z80-based system, and they later made a PC clone. It seems they had a 6502-based computer in development that never shipped – or at least they were considering making one, and Microsoft added the target; this target doesn’t actually change any of the defaults.

Organization of the Source

The source uses the PAGE and SUBTTL keywords for organization. Here are the headings:

SUBTTL  SWITCHES,MACROS.
SUBTTL  INTRODUCTION AND COMPILATION PARAMETERS.
SUBTTL  SOME EXPLANATION.
SUBTTL  PAGE ZERO.
SUBTTL  RAM CODE.
SUBTTL  DISPATCH TABLES, RESERVED WORDS, AND ERROR TEXTS.
SUBTTL  GENERAL STORAGE MANAGEMENT ROUTINES.
SUBTTL  ERROR HANDLER, READY, TERMINAL INPUT, COMPACTIFY, NEW, REINIT.
SUBTTL  THE "LIST" COMMAND.
SUBTTL  THE "FOR" STATEMENT.
SUBTTL  NEW STATEMENT FETCHER.
SUBTTL  RESTORE,STOP,END,CONTINUE,NULL,CLEAR.
SUBTTL  LOAD AND SAVE SUBROUTINES.
SUBTTL  RUN,GOTO,GOSUB,RETURN.
SUBTTL  "IF ... THEN" CODE.
SUBTTL  "ON ... GO TO ..." CODE.
SUBTTL  LINGET -- READ A LINE NUMBER INTO LINNUM
SUBTTL  "LET" CODE.
SUBTTL  PRINT CODE.
SUBTTL  INPUT AND READ CODE.
SUBTTL  THE NEXT CODE IS THE "NEXT CODE"
SUBTTL  DIMENSION AND VARIABLE SEARCHING.
SUBTTL  MULTIPLE DIMENSION CODE.
SUBTTL  INTEGER ARITHMETIC ROUTINES.
SUBTTL  FRE FUNCTION AND INTEGER TO FLOATING ROUTINES.
SUBTTL  SIMPLE-USER-DEFINED-FUNCTION CODE.
SUBTTL  STRING FUNCTIONS.
SUBTTL  PEEK, POKE, AND FNWAIT.
SUBTTL  FLOATING POINT ADDITION AND SUBTRACTION.
SUBTTL  NATURAL LOG FUNCTION.
SUBTTL  FLOATING MULTIPLICATION AND DIVISION.
SUBTTL  FLOATING POINT MOVEMENT ROUTINES.
SUBTTL  SIGN, SGN, FLOAT, NEG, ABS.
SUBTTL  COMPARE TWO NUMBERS.
SUBTTL  GREATEST INTEGER FUNCTION.
SUBTTL  FLOATING POINT INPUT ROUTINE.
SUBTTL  FLOATING POINT OUTPUT ROUTINE.
SUBTTL  EXPONENTIATION AND SQUARE ROOT FUNCTION.
SUBTTL  EXPONENTIATION FUNCTION.
SUBTTL  POLYNOMIAL EVALUATOR AND THE RANDOM NUMBER GENERATOR.
SUBTTL  SINE, COSINE AND TANGENT FUNCTIONS.
SUBTTL  ARCTANGENT FUNCTION.
SUBTTL  SYSTEM INITIALIZATION CODE.

Paul Allen vs. Bill Gates

The source of the 8080 version states:

PAUL ALLEN WROTE THE NON-RUNTIME STUFF.
BILL GATES WROTE THE RUNTIME STUFF.
MONTE DAVIDOFF WROTE THE MATH PACKAGE.

People have since wondered what runtime vs. non-runtime meant, especially since Paul Allen’s recent debate on whether the company’s ownership was faily split.

The BASIC for 6502 source sheds some light on this:

NON-RUNTIME STUFF
        THE CODE TO INPUT A LINE, CRUNCH IT, GIVE ERRORS,
        FIND A SPECIFIC LINE IN THE PROGRAM,
        PERFORM A "NEW", "CLEAR", AND "LIST" ARE
        ALL IN THIS AREA. [...]

So by “runtime” they just literally mean “at run time”: all code that is active when the program runs, as opposed to non-runtime, which is all code that assists editing the program.

By this understanding, we can assume this:

  • Paul Allen wrote the macro package for the MACRO-10 assembler, the 6502 simulator, the tokenizer, the detokenizer, as well as finding, inserting and deleting BASIC lines.
  • Bill Gates implemented all BASIC statements, functions, operators, expression evaluation, stack management for FOR and GOSUB, the memory manager, as well as the array and string library.
  • Monte Davidoff wrote the floating point math package.

Version and Date

The last entry in the change log has a date of 1978-07-27. Both the comment in the first line of the file and the message printed at startup call it version 1.1.

What does this say about the version of the source? Is it the last version? Let’s look at the last bug fix and compare which BASIC binaries contain this fix, and let’s see whether there are fixes in BASIC binaries that are not in the source.

I have previously compared binaries of derivatives of BASIC for 6502 and compiled the information at github.com/mist64/msbasic. The last entry in the log of this source is about a bug that failed to correctly invalidate a pointer in the RETURN statement. According to my analysis of BASIC 6502 versions, this is fixed in the BASIC binaries for AIM-65, SYM-1, Commodore v2, KBD BASIC and MicroTAN, i.e. on everything my previous analysis calls CONFIG_2A and higher.

The same analysis also came to the conclusion that there were two successors, CONFIG_2B and CONFIG_2C. At least the two CONFIG_2B fixes exist in two BASIC binaries: KBD BASIC and MicroTAN, but they don’t exist in this source. It’s very unlikely that both these bugs (and only these!) got fixed by the two computer manufacturers independently, so it’s safe to assume that this source is not the final version – but pretty close to it!

Interesting Finds

  • This code is comparing a keyboard input character to the BEL code. Bob Albrecht is a computer educator that “was instrumental in helping bring about a public-domain version of Basic (called Tiny Basic) for early microcomputers.”.
    CMPI    7               ;IS IT BOB ALBRECHT RINGING THE BELL
                            ;FOR SCHOOL KIDS?
    
  • External documentation usually calls the conversion of ASCII BASIC text into the compressed format “tokenizing”. The source calls this “crunching”.
  • Microsoft is still spelled “Micro-Soft”.
  • Apparently the multiplication function could use some performance improvements:
    		
    BNE     MLTPL2          ;SLOW AS A TURTLE !
    
  • The NEW command is actually called SCRATCH in labels and comments – maybe other BASIC dialects called it that, and they decided to rename it to NEW later?
  • The math package documentation says:
    MATH PACKAGE
            THE MATH PACKAGE CONTAINS FLOATING INPUT (FIN),
            FLOATING OUTPUT (FOUT), FLOATING COMPARE (FCOMP)
            ... AND ALL THE NUMERIC OPERATORS AND FUNCTIONS.
            THE FORMATS, CONVENTIONS AND ENTRY POINTS ARE ALL
            DESCRIBED IN THE MATH PACKAGE ITSELF.
    

    Commodore’s derived source changes this to:

    ; MATH PACKAGE
    ;       THE MATH PACKAGE CONTAINS FLOATING INPUT FIN, OUTPUT
    ;       FOUT, COMPARE FCOMP...AND ALL THE NUMERIC OPERATORS
    ;       AND FUNCTIONS.  THE FORMATS, CONVENTIONS AND ENTRY
    ;       POINTS ARE ALL DESCRIBED IN THE MATH PACKAGE ITSELF.
    ;       (HA,HA...)
    
  • CHRGET is a central piece of BASIC for 6502. Here it is in its entirety:
    ; THIS CODE GETS CHANGED THROUGHOUT EXECUTION.
    ; IT IS MADE TO BE FAST THIS WAY.
    ; ALSO, [X] AND [Y] ARE NOT DISTURBED
    ;
    ; "CHRGET" USING [TXTPTR] AS THE CURRENT TEXT PNTR
    ; FETCHES A NEW CHARACTER INTO ACCA AFTER INCREMENTING [TXTPTR]
    ; AND SETS CONDITION CODES ACCORDING TO WHAT'S IN ACCA.
    ;       NOT C=  NUMERIC   ("0" THRU "9")
    ;       Z=      ":" OR END-OF-LINE (A NULL)
    ;
    ; [ACCA] = NEW CHAR.
    ; [TXTPTR]=[TXTPTR]+1
    ;
    ; THE FOLLOWING EXISTS IN ROM IF ROM EXISTS AND IS LOADED
    ; DOWN HERE BY INIT. OTHERWISE IT IS JUST LOADED INTO THIS
    ; RAM LIKE ALL THE REST OF RAM IS LOADED.
    ;
    CHRGET: INC     CHRGET+7        ;INCREMENT THE WHOLE TXTPTR.
            BNE     CHRGOT
            INC     CHRGET+8
    CHRGOT: LDA     60000           ;A LOAD WITH AN EXT ADDR.
    TXTPTR= CHRGOT+1
            CMPI    " "             ;SKIP SPACES.
            BEQ     CHRGET
    QNUM:   CMPI    ":"             ;IS IT A ":"?
            BCS     CHRRTS          ;IT IS .GE. ":"
            SEC
            SBCI    "0"             ;ALL CHARS .GT. "9" HAVE RET'D SO
            SEC
            SBCI    256-"0"         ;SEE IF NUMERIC.
                                    ;TURN CARRY ON IF NUMERIC.
                                    ;ALSO, SETZ IF NULL.
    CHRRTS: RTS                     ;RETURN TO CALLER.
    

    Did you ever wonder why all versions have $EA60 encoded into the LDA instruction that later gets overwritten? Because it’s 60000 decimal. That’s why! The source actually uses 60000 as a placeholder for 16 bit values in several places.

  • The handling of π, ST, TI and TI$ (all Commodore-specific) looks wonky: Instead of making them tokens, they are special cased in several places. I always assumed it was Commodore adding this without understanding (or wanting to disrupt) the existing code, but it was Microsoft adding these features. Maybe they were added by someone other than the original developers?

Origin of the File

The source was posted on the Korean-language blog 6502.tistory.com without further comment, in a marked-up format:

================================================================================================
FILE: "david mac g5 b:m6502.asm"
================================================================================================

000001  TITLE   BASIC M6502 8K VER 1.1 BY MICRO-SOFT
[...]
006955          END     $Z+START

End of File -- Lines: 6955 Characters: 154740

SUMMARY:

  Total number of files : 1
  Total file lines      : 6955
  Total file characters : 154740

This formatting was created by an unpublished tool by David T. Craig, who published a lot of Apple-related soure code (Apple II, Apple III, Lisa) in this format in as early as 1993, first anonymously, later with his name).

The filename “david mac g5 b:m6502.asm” (disk name “david mac g5 b”, file name “m6502.asm”, since it was a classic Mac OS tool) confirms David Craig’s involvement, and it means the line numbers were added no earlier than 2003.

Given all this, it is safe to assume the file with the Microsoft BASIC for 6502 source originated at Apple, and was given to David Craig together with the other source he published.

The version I posted is a reconstruction of the original file, with the header, the footer and the line numbers removed, and the spaces converted back into tabs. I chose the name “M6502.MAC” to be consistent with the MACRO-10 file extension used by the Microsoft BASIC for 8080 sources.