Archive for July, 2009


Tuesday, July 28th, 2009

Commodore computers up to BASIC 2.0 (like the Commodore 64, the VIC-20 and the PET 2001) only had a very basic understanding of mass storage: There were physical device numbers that were mapped to the different busses, and the “KERNAL” library had “open”, “read”, “write” and “close” functions that worked on these devices. There were also higher-level “load” and “save” functions that could load and save arbitrary regions of memory: The first two bytes of the file would be the (little endian) start address of the memory block.

With no special knowledge of “block storage” devices like disk drives, BASIC 2.0, which was not only a programming laguage but basically the shell of Commodore computers, could not express commands like “format a disk”, “delete a file” or “show the directory”. All this functionality, as well as the file system implementation, was part of the firmware of the disk drives.

Sending a Command

Sending commands to the drive was done by using the “open” call with a “secondary address” of 15: The computer’s KERNAL just sent the file name and the secondary address over the IEC bus as if it were to open a file, but the floppy drive understood secondary address 15 as the command channel. So for example, deleting a file from BASIC looked like this:

OPEN 1,8,15,"S:FOO": CLOSE 1

“1″ is the KERNAL’s file descriptor, “8″ the device number and “15″ the secondary address. Experts omitted the close, because it blocked on the completion of the operation.

Getting Data Back

While the “OPEN” line for disk commands was pretty verbose, it was still doable. Getting the error message of the last operation back was more tricky: It required a loop in BASIC that read bytes from channel 15 until EOF was reached.

Getting a directory listing would be in the same class of problem, since it requires the computer to send a command (and a file name mask) to the floppy and receive the data. Neither BASIC nor KERNAL knew how to do this, and since this was such a common operation, it wouldn’t have been possible to have the user type in a 4 line BASIC program just to dump the directory contents.

The BASIC Program Hack

Here comes the trick: If the program to load started with a “$” (followed by an optional mask), the floppy drive just returned the directory listing – formatted as a BASIC program. The user could then just “LOAD” the directory and “LIST” it if it were a BASIC program:



0 "TEST DISK       " 23 2A
20   "FOO"               PRG
3    "BAR"               PRG

In this example, “TEST DISK” is the disk name, “23″ the disk ID and “2A” the filesystem format/version (always 2A on 1540/1541/1551/1570/1571 – but this was only a redundant copy of the version information which was never read and could be changed). There are two files, 20 and 3 blocks in size respecively (a block is a 256 byte allocation unit on disk – since blocks are stored as linked lists there are only 254 bytes of payload), and both are of the “PRG” type.

Encoding of Commodore BASIC Programs

The floppy was aware of the encoding that Commodore BASIC (a derivative of Microsoft BASIC for 6502) used and prepared the directory listing in that way. A BASIC program in memory is a linked list of lines. Every line starts with a 2-byte pointer to the next line. A 0-pointer marks the end of the program. The next two bytes are the line number, followed by the zero-terminated encoded line contents.

The LIST command decodes a BASIC program in memory by following the linked list from the start of BASIC RAM. It prints the line number, a space, and the line contents. These contents have BASIC keywords encoded as 1-byte tokens starting at 0×80. Character below 0×80 are printed verbatim. Here is what 10 PRINT"HELLO WORLD!" would look like:

0801  0E 08    - next line starts at 0x080E
0803  0A 00    - line number 10
0805  99       - token for PRINT
0806  "HELLO!" - ASCII text of line
080D  00       - end of line
080E  00 00    - end of program

The example directory listing from above would be encoded by the floppy like this:

0801  21 08    - next line starts at 0x0821
0803  00 00    - line number 0
0805  '"TEST DISK       " 23 2A '
0820  00       - end of line
0821  21 08    - next line starts at 0x0821
0823  14 00    - line number 20
0825  '  "FOO"               PRG '
0840  00       - end of line

A couple of things are interesting here:

  • The line with the disk name and the ID is actually printed in inverted letters, which is done by having the “revert” character code as the first character of the first line, i.e. the floppy makes the assumption that the computer understands this convention.
  • BASIC will print the file sizes as variable-with line numbers, so the floppy adds extra spaces to the beginning of the line contents to have all file names aligned.
  • The floppy needs to populate the next line pointers for the linked list.

The Link Pointer

The obvious question here is: How can the floppy know where in the computer’s memory the BASIC program will live? The answer is: It doesn’t. The BASIC interpreter supports having its program anywhere in memory, and loading programs that were saved from other locations on memory – or possibly other Microsoft BASIC compatible computers with a different memory layout. The VIC-20 had BASIC RAM at 0×0401, the C64 at 0×0801 and the C128 at 0x1C01. Therefore, BASIC “rebinds” a program on load, searching for the zero-terminator of the lines and filling the (redundant) link pointers.

The floppy therefore only has to send non-zero values as the link pointers for BASIC to accept the directory listing as a program. In fact, a 1541 sends the directory with a 0×0401-base, which would be valid on a VIC-20. The reason for this is that the 1541 is only a 1540 with minor timing fixes for C64 support, and the 1540 is the floppy drive that was designed for the VIC-20.

Therefore, if you do LOAD"$",8,1 on a C64, the extra “,1″ will be interpreted by the KERNAL LOAD code to load the file at its original address (as opposed to the beginning of BASIC RAM), and since there is screen RAM at 0×0400 on the C64, garbage will appear on the screen, because the character encoding of screen ram is incompatible with BASIC character encoding.

Directory Code in 61 Bytes

There are two problems with this “directory listing is a BASIC program” hack: Listing the directory overwrites a BASIC program in RAM, and listing the directory from inside an application is non-trivial.

Therefore, many many implementations to show a directory listing exist on the C64 – and I want to present my own one here, which is, to my knowledge, the shortest existing (and maybe shorted possible?) version. It is based on a 70 byte version published in “64′er Magazin” some time in the 80s, and I managed to get it down to 61 bytes.

,C000:  A9 01     LDA #$01     ; filename length
,C002:  AA        TAX
,C003:  A0 E8     LDY #$E8     ; there is a "$" at $E801 in ROM
,C005:  20 BD FF  JSR $FFBD    ; set filename
,C008:  A9 60     LDA #$60
,C00A:  85 B9     STA $B9      ; set secondary address
,C00C:  20 D5 F3  JSR $F3D5    ; OPEN (IEC bus version)
,C00F:  20 19 F2  JSR $F219    ; set default input device
,C012:  A0 04     LDY #$04     ; skip 4 bytes (load address and link pointer)
,C014:  20 13 EE  JSR $EE13    ; read byte
,C017:  88        DEY
,C018:  D0 FA     BNE $C014    ; loop
,C01A:  A5 90     LDA $90
,C01C:  D0 19     BNE $C037    ; check end of file
,C01E:  20 13 EE  JSR $EE13    ; read byte (block count low)
,C021:  AA        TAX
,C022:  20 13 EE  JSR $EE13    ; read byte (block count high)
,C025:  20 CD BD  JSR $BDCD    ; print 16 bit integer
,C028:  20 13 EE  JSR $EE13    ; read character
,C02B:  20 D2 FF  JSR $FFD2    ; print character to stdout
,C02E:  D0 F8     BNE $C028    ; loop until zero
,C030:  20 D7 AA  JSR $AAD7    ; print carriage return character
,C033:  A0 02     LDY #$02
,C035:  D0 DD     BNE $C014    ; skip 2 bytes next time (link pointer)
,C037:  20 42 F6  JSR $F642    ; CLOSE
,C03A:  4C F3 F6  JMP $F6F3    ; reset default input device

(There is a similar implementation here.)

There are two limitations of this code though: It omits the extra space between the block number and the filename, leading to a slightly different output, and it cannot be interrupted.

The Infinite Loop Mystery

Monday, July 20th, 2009

Today’s puzzle is about some code behaving horribly wrong.

Recently, I was working on some operating system project and hacking on the code to switch between privileged and non-privileged mode. I could switch modes successfully and intercept traps when in non-privileged mode.

Then I wanted to check whether I could handle timer interrupts correctly, so I added this to my non-privileged code, to give the timer interrupt a chance to fire:

    volatile int i;
    for (i=0; i<10000; i++);

Timer interrupts were handled correctly and eventually returned to the non-privileged code – but the delay code turned into an infinite loop!

I changed to loop to count only to 10, and I changed it to count down instead of up, but the result remained the same. I looked at the generated assembly. It looked like this:

    movl    $10, 0xfc(%ebp)	// i = 10
    jmp     1f			// goto 1
    movl    0xfc(%ebp), %eax	// %eax = i
    decl    %eax		// %eax--
    movl    %eax,0xfc(%ebp)	// i = %eax
    movl    0xfc(%ebp), %eax	// %eax = i
    testl   %eax, %eax		// if (%eax > 0)
    jg      2b			// goto 2

It looked fine. On every timer interrupt, I dumped %eax, and it was stuck at 10. I debugged my pusha/popa code to save and restore registers between modes, and it was okay. I debugged my flag handing code, and flags were fine.

Then I replaced my C code with the generated assembly code and added instructions that copied the value of %eax before the “decl” into %ebx, and after the “decl” into %ecx and added a trap instruction right after that to have privileged mode print out the values of the three registers.

    movl    $10, 0xfc(%ebp)	// i = 10
    jmp     1f			// goto 1
    movl    0xfc(%ebp), %eax	// %eax = i
    movl    %eax, %ebx          // value before
    decl    %eax		// %eax--
    movl    %eax, %ecx          // value after
    movl    %eax,0xfc(%ebp)	// i = %eax
    movl    0xfc(%ebp), %eax	// %eax = i
    testl   %eax, %eax		// if (%eax > 0)
    jg      2b			// goto 2

The result was %eax = %ebx = %ecx = 10. This is when I understood what was going on.

Please share your comments below. :-)

This is Copyright 1983 Microsoft – NOT!

Tuesday, July 14th, 2009

If you look at a hexdump of any version of the Logitech mouse driver for MS-DOS, you will see the following:

*** This is Copyright 1983 Microsoft ***

000007c0                                       2a 2a 2a 20  |            *** |
000007d0  54 68 69 73 20 69 73 20  43 6f 70 79 72 69 67 68  |This is Copyrigh|
000007e0  74 20 31 39 38 33 20 4d  69 63 72 6f 73 6f 66 74  |t 1983 Microsoft|
000007f0  20 2a 2a 2a                                       | ***            |

Microsoft introduced the mouse to MS-DOS, and they specified the mouse driver interface and implemented the first MS-DOS mouse driver. Did Logitech license their code? Or did they steal it? Let’s look closer:

This is a LOGITECH mouse driver, but some software expect here the following string:*** This is Copyright 1983 Microsoft ***

00000770                           54 68 69 73 20 69 73 20  |        This is |
00000780  61 20 4c 4f 47 49 54 45  43 48 20 6d 6f 75 73 65  |a LOGITECH mouse|
00000790  20 64 72 69 76 65 72 2c  20 62 75 74 20 73 6f 6d  | driver, but som|
000007a0  65 20 73 6f 66 74 77 61  72 65 20 65 78 70 65 63  |e software expec|
000007b0  74 20 68 65 72 65 20 74  68 65 20 66 6f 6c 6c 6f  |t here the follo|
000007c0  77 69 6e 67 20 73 74 72  69 6e 67 3a 2a 2a 2a 20  |wing string:*** |
000007d0  54 68 69 73 20 69 73 20  43 6f 70 79 72 69 67 68  |This is Copyrigh|
000007e0  74 20 31 39 38 33 20 4d  69 63 72 6f 73 6f 66 74  |t 1983 Microsoft|
000007f0  20 2a 2a 2a                                       | ***            |

This string is located directly before the INT 0×33 API entry point, so it is easy to check for it. There’s a sucker born every minute, but this still makes you wonder what kind of programmer would really check for a string like this, even if some Microsoft API reference indeed suggested to do so. Maybe it was only Microsoft software to compare the string.

In either case, this is a very bad and unfair practice. If you define an interface, don’t add a call to ask for the version (or even the vendor!), but add feature bits instead, so that an alternative implementation can choose to be compatible with parts of it (or extend the interface independently). And if you are a developer that uses an interface, use feature bits if they are there, and resist the temptation to check for the vendor, even if the API documentation tells you to do so.

The Giant Pile of Money in My Office

Tuesday, July 7th, 2009

Corporate security thought it wasn’t the best idea:

(1:06 min, 176 KB)