Author Archives: Michael Steil

Original Commodore Source Code Collection

Over the years, the ROM source code of many Commodore computers and peripherals has appeared. I have been collecting them in a git repository here:

https://github.com/mist64/cbmsrc

These are some of the gems in the collections:

  • BASIC and KERNAL for the C64
  • BASIC and KERNAL for the TED (C16, C116, Plus/4)
  • BASIC and KERNAL for the C128
  • Commodore DOS for the 1540, 1571 and 1581

Some of the sources have been carefully converted to modern encodings (ASCII, LF) and the indentation of some older sources has been fixed up to match known “LST” printouts.

I am planning to add more source. I’m happy for any pointers.

Commodore Peripheral Bus: Part 3: Commodore DOS

In the series about the variants of the Commodore Peripheral Bus family, this article covers the common layer 4: The “Commodore DOS” interface to disk drives.

Commodore DOS is supported by all floppy disk and hard disk drives for the Commodore 8 bit family (such as the well-known 1541), independently of the connectors and byte transfer protocols on the lower layers of the protocol stack. The protocol specifies APIs for file access, for direct block access as well as for executing code on the device. Only a core set it supported by all devices though, while some devices have additional APIs.

From a device’s point of view, the layer below, layer 3 (“TALK/LISTEN”) defines the following:

  • A device has 32 channels (a.k.a. secondary addresses, 0-31).
  • A channel (0-15) can be associated with a name and dissociated from it again.
  • A device can send byte streams from channels.
  • A device can receive byte streams into channels.

The Commodore DOS API defines the meaning of channel numbers, channel names and the data traveling over channels in the context of disk drives.


NOTE: I am releasing one part every week, at which time links will be added to the bullet points below. The articles will also be announced on my Twitter account @pagetable and my Mastodon account @pagetable@mastodon.social.


  • Part 0: Overview and Introduction
  • Part 1: IEEE-488 [PET/CBM Series; 1977]
  • Part 2: The TALK/LISTEN Layer
  • Part 3: The Commodore DOS Layerthis article
  • Part 4: Standard Serial (IEC) [VIC-20, C64; 1981] (coming soon)
  • Part 5: TCBM [C16, C116, Plus/4; 1984] (coming soon)
  • Part 6: JiffyDOS [1985] (coming soon)
  • Part 7: Fast Serial [C128; 1986] (coming soon)
  • Part 8: CBDOS [C65; 1991] (coming soon)

Feature Sets

Commodore DOS has been in existence since the Commodore 2040 drive from 1978, and new firmware code for Commodore DOS devices is being developed to this day. The API has gotten some extensions in the meantime, so while this article covers the complete API, it important to understand that not all APIs are supported by all devices.

Feature 2040 1541 1571/1581 RAMDOS CMD HD/FD RAMLink SD2IEC
Sequential files yes yes yes yes yes yes yes
Relative files no yes yes yes yes yes yes
Block access yes yes yes no yes yes yes
Code execution yes yes yes no yes no1 no
Burst commands no no yes no yes no no
Time no no no no yes yes yes
Partitions no no no no yes yes yes
Subdirectories no no no no yes yes yes

(The 2040 is Commodore’s first (dual) disk drive. A ROM update to 2.0 later added relative file support. The 1541, 1571 and 1581 are the well-known C64 drives. RAMDOS is a RAM-disk application by Commodore that shipped with the “REU” RAM extender. Creative Micro Devices (CMD) made floppy disk and hard disk drives in the 1990s. SD2IEC is a modern floppy/hard drive replacement for Commodore computers with a Serial bus.)

Commodore drives up to the 1581 will be called “classic” devices further on.

The last device released by Commodore was the 1581 from 1987. CMD practically picked up this line of devices by releasing the HD and FD series devices with lots of added features – and APIs. These additions should be considered canonical.

Concepts

Commodore DOS calls a device (with its own primary address) connected to the bus a unit.

A unit can have one or more media1, a sequence of blocks whose numbering is independent of the other media. A medium usually contains a filesystem, but it can also be used for direct block access independently of any filesystem. These media are numbered, starting from “0”.

For devices that do not support partitions, a simple one-drive unit like the Commodore 1541 only has a single medium “0”. A dual-drive unit like the Commodore 8250 has two, named “0” and “1”, one for each disk drive. Reference manuals of these kinds of units call these numbers the drive number.

On devices that do support partitioning, each partition is a medium. The partitions are numbered starting with “1”, while “0” always points to the currently active partition. Reference manuals of these kinds of units call these numbers the partition number.

API Basics

Channels

Communication to Commodore DOS happens through 15 data channels and one command channel:

Channel Description
0 named (PRG read)
1 named (PRG write)
2-14 named
15 commands/status
16-31 illegal

(While the underlying layers of the bus specifies channel numbers (secondary addresses) from 0 to 31, Commodore DOS does not support the numbers 16-31.)

Channels 0 to 14 need to be associated with a name and are used for the transfer of raw data like file and block contents. (0 and 1 are special-cased and will be discussed later.)

Channel 15 is a “meta” channel. When writing to it, the device interprets the byte stream as commands in a unified format. It either controls something about a specific data channel (out of band communication), or is a global command. When reading from it, the byte stream from the device is usually status information in a unified format, and sometimes raw response data to a command.

Commands

All commands are byte streams that are mostly ASCII, but with some binary arguments in some cases. All devices allow commands of up to 40 bytes in length, some allow more.

There are two different ways to send them:

They can be sent as a byte stream to channel 15, terminated an EOI or UNLISTEN event2. The following BASIC code sends the command “I” to drive 8 this way:

OPEN 1,8,15  
PRINT#1, "I";  
CLOSE 1

(On layer 3, this will send LISTEN 8/SECOND 15/“I”/UNLISTEN.)

Alternatively, channel 15 can be opened as a named channel with the command as the name. This is not a real “OPEN” operation, and a closing would be a no-op. It just allows shorter code, e.g. in BASIC:

OPEN 1,8,15,"I"  
CLOSE 1

(On layer 3, this will send LISTEN 8/OPEN 15/“I”/UNLISTEN/LISTEN 8/CLOSE 15/UNLISTEN.)

In both cases, commands that don’t contain binary arguments can also be terminated by the CR character.

Status

The status information that is sent from channel 15 is a CR-delimited ASCII-encoded string with a uniform encoding:

code,string,a,b[,c]

  • code is a two-digit decimal error code.
  • string is a short English-language version of the error code.
  • a and b are two additional at least two-digit decimal numbers3 that depend on the type of error (“00” if unused).
  • c is the single-digit decimal number drive that caused the status message. Devices with only a single drive don’t usually return this4.

A status code of 0 will return the string “00, OK,00,00” (or “00, OK,00,00,0” on dual-drive devices, assuming the last command was performed on drive 0).

Reading the status will clear it. So if the user keeps on reading the status channel, the device will keep sending “00, OK,00,00” messages.

The following BASIC program will read a single status message:

10 OPEN 1,8,15  
20 GET#1,A$: PRINT A$;: IF A$<>CHR$(13) GOTO 20  
30 CLOSE 1

More info on error codes towards the end of this article.

APIs

There are several independent sets of API:

  • File Access API: This allows high-level access to files. No knowledge of the underlying data structures is needed. All devices support it.

  • Block Access API: This allows reading and writing individual logical blocks (256 bytes) of a medium. This can (optionally) be done ignorant of but still compatible with the filesystem’s metadata, i.e. to allow custom blocks to coexist with a healthy filesystem on a single medium. This API is optional. RAM-disks (Like RAMDOS) and network-backed devices don’t usually support it.

  • Code Execution API: This allows reading and writing memory in the unit’s interface controller, as well as executing code in its context. It is highly device-specific, but allows for implementing optimized or specialized code for existing functionality, or a device-side implementation of custom disk formats.

  • Burst API: This is a set of commands that mostly provides low-level access to the disk controller in order to allow reading and writing physical sectors – mostly to support foreign disk formats. The API also contains commands that initiate device memory and file access using the a variant of the “Serial” layer 2 (“Burst Transfer”).

  • Settings API: Later devices support a canonical set of global settings commands.

  • Clock API: Some devices keep track of time and can assign timestamp metadata to files. These devices allow reading and writing the current time using the clock API.

File Access API

Every medium has its own independent filesystem. A filesystem has a name, a two-character ID5, and contains an unsorted set of files. All files have a unique name as well as a file type, and have to be at least one byte in size6. Some devices maintain a last-changed timestamp with files, and some support nested subdirectories in order to group files.

Commodore DOS does not specify a maximum size for disk or file names, but the limit for all Commodore and CMD devices is 16 characters. There is also no specified character encoding: Names consist of 8 bit characters, and DOS does not interpret them. Names have very few limitations:

  • The “,” (comma), “:” (colon), “=” (equals) and CR (carriage return) characters are illegal in disk, file and directory names (because of the syntax of channel names and commands).
  • The “/” (slash) character is illegal for directory names (because of the syntax of path specifiers).
  • The code 0xa0 (Unicode/ISO 8859-1 non-breaking space, CBM-ASCII shifted SPACE) is illegal in file and directory names (it is used as the terminating character on disk).

There are two categories of files: sequential and relative files.

Sequential files only allow linear access, i.e. it is impossible to position the read or write pointer. They can be appended to though. There are three types of sequential files: SEQ, PRG and USR. They are treated the same by DOS, but the user convention is to store executable programs in PRG files and data into SEQ files.

Relative files (REL) have a fixed record size of 1 to 254 bytes and allow positioning the read/write pointer to any record and thus allow random access. (Early Commodore drives with DOS 1.x and some modern solutions don’t support this.)

While the interface to DOS often requires to specify the file type, it is not part of a file’s identifier, i.e. there can not be two files with the same name but just a different type.

Paths

Paths specify a subdirectory on a medium:

[medium][[/]/dirname[/…]/]

There can be an arbitrary number of dirname specifiers. Both the medium and the subdirectory names are optional. Omitting the medium will select medium 0, and omitting the subdirectory names will select the current subdirectory. Two slashes at the beginning mean the first directory name is relative to the root, otherwise it is relative to the current directory.

Examples:

  • “” – the current directory on medium 0
  • 1” – the current directory on medium 1
  • 1/FOO/” – the subdirectory FOO of the current directory on medium 1
  • 1//FOO/” – the subdirectory FOO in the root on medium 1
  • /FOO/” – the subdirectory FOO of the current directory on medium 0
  • 1//FOO/BAR/” – the subdirectory BAR inside the subdirectory FOO in the root of medium 1

On devices that do not support subdirectories, paths only consist of the (optional) medium name:

  • “” – medium 0
  • 0” – medium 0
  • 1” – medium 1

Wildcards

Some APIs permit using wildcard characters:
* A question mark (“?”) matches any character.
* An asterisk (“*”) matches zero or more characters. On classic devices, characters in the pattern after the asterisk are ignored, so an asterisk can only match characters at the end of the name.

Opening Files

Files are read and written through named channels 0 through 14. Opening a named channel associates the channel with the filename. Closing it will dissociate the channel and, for files that were written to, make sure the file data on disk is consistent.

The following channel name syntax is used to open a file for reading or writing:

[[@][path]:]filename[,type[,mode]]

The core of the channel name is the name of the file to be accessed. If an existing file is opened, wildcards are allowed.

There are optional prefixes and suffixes.

  • The modifier flag “@” specifies that the file is supposed to be overwritten, if it is opened for writing and it already exists7 – the default is to return an error.

  • The path component allows specifing a medium and/or a sequence of subdirectory names. It defaults to medium 0 and the current path. The use of “@”, a path, or both, requires adding a “:” character as a delimiter between the prefix and the filename.

  • The file type is one of “S” (SEQ), “P” (PRG), “U” (USR), or “L” (REL). If the type is omitted, PRG is assumed.

  • The mode byte depends on the file type (see below).

(By specifying a path (even if empty, so it’s just the “:” prefix), it is possible to use filenames that start with “$” or “#”. These letters would otherwise indicate special named channels – see next sections.)

Sequential Files

For SEQ, PRG and USR, the following modes are possible:

  • R (read): Reading from the channel will return the file contents sequentially. The file pointer starts at the beginning of the file. When all bytes have been read, the unit signals this with an EOI condition.
  • M: (recovery read): This mode is a recovery feature that allows reading a file that is marked as inconsistent (i.e. written to but never closed) in the filesystem’s metadata. Normal read mode would refuse to open it.
  • W (write): The file will be created (if it doesn’t exist or the “@” modifier has been specified), and writing to the channel will write the bytes into the file. The file has to be closed for it to be consistent. Creating a file and closing it without writing anything will result in a file that contains a single CR character8.
  • A (append): Same as writing, but the file has to exist and the file pointer starts at the end of the file.

The default mode is “R”.

(Commodore DOS does not generally allow changing the file pointer on sequential files, but some modern solutions like SD2IEC allow the P command meant for relative files even in this case.)

Relative Files

For relative files, the mode character is actually a binary-encoded byte that specifies the record size. For creating a relative file, it must be specified, for opening an existing one, it can be omitted. Relative files are always open for reading and writing.

Positioning of the read/write pointer to a particular record is done by sending the “P” command on the command channel. The arguments are four binary-encoded bytes.

Name Syntax Description
POSITION P channel record_lo record_hi offset Set record index in REL file

Relative files do not have to be closed for the data on disk to be consistent.

Channels 0 and 1

While channels 2 through 14 can be used to open any kind of file in any mode, channels 0 and 1 are shortcuts that hardcode type and mode:

  • Channel 0 is the LOAD channel: It forces a type of PRG and an access mode of “read”.
  • Channel 1 is the SAVE channel: It forces a type of PRG and an access mode of “write”.

Directory Listing

Reading the directory listing is done by associating channel 0 with a name starting with $ and reading from it. This is the syntax:

$[=T][[path]:][pattern[,…][={type|option}[,option, …]]]

Just using “$” as the name will return the complete contents of the current directory contents of medium 0. Specifying the path, followed by a colon, will override this. Additionally, one or more file name patterns can be appended to filter which directory entries are returned. Specifying “=” followed by a single-character file type specifier will only show files of a particular type. (Inconsistently, it’s “R” for “REL”, not “L” like in the case of opening a relative file.)

Devices that support time will allow a file listing that is amended with timestamp information, which can be requested using the “=T” modifier. In this case, any number of option arguments can be specified, which have the following meaning:

  • L: append long timestamp format; the default is shortened to fit an entry into 40 characters
  • N: do not append timestamp
  • <timestamp: filter for files last changed before timestamp
  • >timestamp: filter for files last changed after timestamp

The syntax of the timestamp argument works like this:

12/31/99 11:59 PM

(The year only has two digits. Consistent with GEOS, a year of “00” represents the year 2000. It is not specified what the cutoff year should be, but 1980 would make sense, so 80-99 would be 1980-1999, and 0-79 would be 2000-2079.)

The format of the data returned is tokenized Microsoft BASIC that can be displayed on-screen easily, but is a little tricky to parse.

Devices that support partitions also introduce the following syntax for listing partitions:

$=P[:*][=type]

The list can be filtered by partition type. CMD devices support the following types:

  • N: CMD native partition
  • 4: 1541 emulation partition (683 blocks)
  • 7: 1571 emulation partition (1366 blocks)
  • 8: 1581 emulation partition (3200 blocks, 1581-partition support)
  • C: 1581 CP/M emulation partition (CMD HD only)

Filesystem Commands

There are many command-channel commands that deal with creating, fixing and modifying the filesystem. There is also a command that does a block-for-block disk copy for units with more than one drive.

Name Syntax Description
INITIALIZE I[medium] Force reading disk metadata
VALIDATE V[medium] Re-build block availability map
NEW N[medium]:name[,id[,format] Low-level or quick format
RENAME R[path]:new_name=old_name Rename file
SCRATCH S[path]:pattern[,…] Delete files
COPY C[path_a]:target_name=[path_b]:source_name[,…] Copy/concatenate files
COPY Cdst_medium=src_medium Copy all files between disk
DUPLICATE D:dst_medium=src_medium Duplicate disk

(Unless otherwise mentioned, arguments for all commands are ASCII.)

The INITIALIZE command is only useful on classic 5.25″ devices. These had trouble detecting the user swapping a disk, so this command makes sure the disk metadata cache is invalidated.

The VALIDATE command is a simple check-disk function that will make sure the “block availability map” is consistent with other on-disk data structures. It is recommended on a disk that contains a file that has not been closed after writing, but it needs to be avoided on disks that contain manually allocated blocks.

The NEW command will create a new filesystem. On removable media if an “ID” is specified, it will do a low-level format before. While most devices that support multiple physical formats will always format with the highest-density one9, CMD FD devices support the format agument (see the FD-2000 manual for a full discussion):

format Description
81 double-density, 1581-compatible
HDN high-density, one native partition
EDN enhanced-density, one native partition

(The Burst API allows more fine-grained formatting settings on supported devices.)

On units with multiple media, the COPY command can also copy files between media, while on all other units, it can only duplicate files. In either case, it can concatenate several files into one.

The DUPLICATE command and the COPY variant that copies all files are only supported on units with multiple drives, they do not work on partitions.

There are two more commands that got introduced by CMD:

Name Syntax Description
LOCK L[path]:name Toggle file write protect
WRITE PROTECT W-{0|1} Set/unset device write protect
RENAME-HEADER R-H[medium]:new_name Rename a filesystem

Devices with partitioning support add the following commands:

Name Syntax Description
CHANGE PARTITION CP num Make a partition the default
GET PARTITION GP num Get information about partition
RENAME-PARTITION R-P:new_name=old_name Rename a partition

(There is a variant of CHANGE PARTITION with the P character shifted (code 0xd0), which takes a binary-encoded partition number instead of an ASCII-encoded one.)

There are no commands to create or delete partitions. These functions have to be done through tools that know the internals of the specific device.

Devices with subdirectory support add the following:

Name Syntax Description
CHANGE DIRECTORY CD[path]:name Change the current sub-directory
CHANGE DIRECTORY CD[medium] Change sub-directory up
MAKE DIRECTORY MD[path]:name Create a sub-directory
REMOVE DIRECTORY RD[path]:name Delete a sub-directory

The syntax to go up one level contains the “” (left arrow) character, which is CBM-ASCII code 0x5f (underscore in US-ASCII).

Block Access API

Commodore DOS also allows accessing the disk on a lower level. The block API works with 256-byte-sized logical blocks that are identified by a logical track (1-255) and a logical sector (0-255)10.

The Buffer

In order to use the block API, a block-sized buffer has to be allocated inside the device by opening a channel (2-14) with the following name:

#[buffer_number]

The buffer number is only useful for a use case around code execution and will be covered later.

Without an explicit number, the next free buffer in the device’s RAM will be allocated. It will stay allocated until the channel is closed.

Reading from the channel gets a byte from the buffer, and writing to the channel puts a byte into the buffer. Every buffer comes with a “buffer pointer” that points to the next byte to be read or written. When creating a buffer, the buffer pointer is set to 1 instead of 0, so reading or writing would skip the first byte in the buffer. (The reason for this is the B-R/B-W API described later.)

The buffer pointer can be set to any position within the buffer with the B-P command.

All arguments in the buffer API are decimal ASCII values and can be separated by a space, a comma or a code 0x1d (ASCII “Group Separator”, CBM-ASCII “Cursor Right”). The command name and the first argument have to be separated by any of the above or a colon.

Name Syntax Description
BUFFER-POINTER B-P channel index Set r/w pointer within buffer

Reading and Writing Blocks

The U1 and U2 commands allow reading a block into the buffer and writing the buffer into a block. The arguments are the channel number, the track and the sector. After reading a block, the buffer pointer is reset to 0, so that the 256 bytes of the bock can be read from the start.

Nevertheless, when uploading a complete block into the buffer and then writing it to disk, it is necessary to set the block pointer to 0 before sending any data because of the block pointer’s default value of 1.

Name Syntax Description
U1/UA U1 channel medium track sector Raw read of a block
U2/UB U2 channel medium track sector Raw write of a block

(On devices that support partitions, the medium number is ignored, and the current partition at the time of allocating the buffer is used instead.)

Block Availability Map Commands

The “B-A” and “B-F” commands allow marking a block as allocated or free in the “block availability map” (BAM). Allocating a block makes sure the filesystem won’t use it. The V (validate) command will re-build the BAM from the filesystem’s metadata and undo any “B-A” commands.

Name Syntax Description
BLOCK-ALLOCATE B-A medium medium track sector Allocate a block in the BAM
BLOCK-FREE B-F medium medium track sector Free a block in the BAM

Using the U1/U2 commands together with B-A and B-F allows using free blocks on the disk for custom use without interfering with the filesystem’s data structures. B-A will return the track and sector number of the next free block in case the one passed as an argument was already allocated. Together with the knowledge that the first block on disk is track 1, sector 0, it is possible to allocate blocks for custom use without any knowledge of the disk layout.

Similarly, a disk can be fully dumped by iterating over all tracks (starting with 1) and sectors (starting with 0) and skipping to sector 0 of the next track whenever an “ILLEGAL TRACK OR SECTOR” (66) error is encountered.

“Random Access Files”

The B-R (block read) and B-W (block write) commands have deceptive names and are part of a rarely used and deeply confusing feature: “Random Access Files”.

While sequential files only allow sequential access to the file’s data, and relative files restrict seeking within the file in record-size steps, the “Random Access File” API calls are meant to give the user a way to build files with arbitrary access patterns.

B-R and B-W are block read/write commands like U1/U2, but they assume a certain data format of the blocks: The first byte is the block’s buffer pointer. Before writing a block to disk, the current buffer pointer will be put into its first byte, signaling how many valid bytes are contained in the block. When reading, it marks the end of the buffer that cannot be read past11 (an EOI condition will be signaled). When reading with B-R, the buffer pointer is set to 1, so that the first byte of the payload will be read first.

Name Syntax Description
BLOCK-READ B-R channel medium track sector Read block
BLOCK-WRITE B-W channel medium track sector Write block

(As with U1/U2, the medium number is ignored on devices that support partitioning.)

Code Execution API

The code execution API provides all the tools to change the detailed behavior of a device as far as replacing the code that runs on it. Of course, code that uses these APIs is completely specific to one kind of device.

Allocating Specific Buffers

Specifying a number after the “#” character in the channel name will allocate the buffer with the specified number. This practically allocates a specific memory area in the device. This is the mapping on a 1541, for example:

Buffer Memory Area
0 $0300$03ff
1 $0400$04ff
2 $0500$05ff
3 $0600$06ff
4 $0700$07ff

On a 1541, buffer 2, which is located from $0500 to $05ff in RAM, is the designated “user buffer”. which is most likely to not be used by the device’s operating system.

Block Execute

With an explicit buffer allocated, one can have the device read a block into the buffer and execute it (by calling the base address) using the B-E command:

Name Syntax Description
BLOCK-EXECUTE B-E channel medium track sector Load and execute a block

The code will run in the context of the “interface CPU”. (Some older Commodore devices had two CPUs, one for the Commodore Peripheral Bus interface, and one as the disk controller.) This CPU is usually a 6502 derivative, but executing code is highly device-specific in any case.

The operating system’s code will resume after the executed code returns.

Memory Commands

The memory commands allow reading and writing device memory as well as executing code at an arbitrary location.

The resulting bytes from the “M-R” command will be delivered through channel 15 in place of the status string.

The arguments are binary-encoded bytes.

Name Syntax Description
MEMORY-WRITE M-W addr_lo addr_hi count data Write RAM
MEMORY-READ M-R addr_lo addr_hi count Read RAM
MEMORY-EXECUTE M-E addr_lo addr_hi Execute code

M-R and M-W allow accessing the operating system’s internal data structures, for example. The combination of M-W and M-E can be used to upload code from the computer and execute it. In case the drive’s operating system does not completely get taken over, it is advisable to allocate a specific buffer before uploading code, so that the operating system won’t overwrite the new code.

USER Commands

The USER commands are meant to give the user a compact command interface that calls uploaded code or code in expansion ROM (if available).

The commands U1 to U9 and U: (and their synonyms UA to UJ) execute code through a jump table. There is a default jump table that can be replaced using a device-specific M-W command, and reset to the default using U0.

Name Syntax Description
U0 U0 Init user vectors
U1-U2/UA-UB (see above) Raw read/write of a block
U3-U8/UC-UH U3U8 Execute in user buffer or expansion ROM
U9/UI UI Soft RESET (NMI)
U:/UJ UJ Hard RESET

For historical reasons12, the default jump table contains the already discussed U1 and U2 commands for reading and writing blocks. U3 to U8 jump into some useful device-specific locations. On most devices, all these jumps point into the user buffer, on some older devices, some jumps point into expansion ROM.

Here are the locations for the 1541:

Command Address
U3/UC $0500
U4/UD $0503
U5/UE $0506
U6/UF $0509
U7/UG $050C
U8/UH $050F

The commands U9 and U: execute a soft and a hard reset, respectively. In both cases, the status will read back code 73, the power-on message, which is useful for detecting the type of device.

Utility Loader Command

The utility loader command instructs the unit to load a file into its RAM and execute it. The file has to follow a certain format and contains checksums13.

Name Syntax Description
UTILITY LOADER &[[path]:]name Load and execute program

Burst API

The Burst API is a set of commands that mostly deal with low-level disk access, which were added to allow the user to read and write foreign MFM disks (1571, 1581, CMD FD), practically bypassing Commodore DOS.

All Burst commands start with with “U014, followed by one or more binary-encoded bytes that specify the command and its arguments.

The following table shows the commands and the binary encoding of their respective first byte. “_” characters don’t encode the command, but parts of the arguments.

Code Command Description
____000_ READ Read logical or physical sectors
____001_ WRITE Write logical or physical sectors
____010_ INQUIRE DISK Detect new disk
____011_ FORMAT Format tracks
____101_ QUERY DISK FORMAT Detect disk format
___0110_ INQUIRE STATUS Detect disk change etc.
___11101 DUMP TRACK CACHE BUFFER Force cache write back
___11111 FASTLOAD Transfer file using Burst Serial protocol

(For the complete command and return value encodings, refer to the 1581 or CMD FD manuals.)

All these commands require that all data transmission is done over a special layer 2 protocol called “Burst” Serial, which is only supported by “Fast Serial” devices. Part 7 of this series covers this protocol.

This explains the inclusion of the “FASTLOAD” command, which does not fit the topic of low-level disk access. The “Fast” Serial additions (as supported by the C128) can use a faster layer 2 protocol transparently to layers 3 and above, but the more specialized “Burst” Serial protocol (which was introduced together with Fast Serial) cannot be used transparently. Therefore the C128 KERNAL uses “FASTLOAD” to explicitly initiate a Burst transfer of the file if the device supports it.

There are also variants of the M-R and M-W commands that use Burst transfer:

Name Syntax Description
BURST MEMORY-READ U0>MR addr_hi count_hi Read RAM (Burst protocol)
BURST MEMORY-WRITE U0>MW addr_hi count_hi Write RAM (Burst protocol)

The Burst API breaks the layering of the protocol stack by marrying the low-level disk access commands to a specific byte transfer protocol15 – both were new features of the 1571.

The low-level disk access commands were added for reading and writing foreign-format 5.25″ disks, mostly for use with the CP/M operating system, which was really only useful on the 1571, and to some extent, the 3.5″ 1581. The CMD HD, for example, supported them for compatibility only. These commands should therefore be regarded as deprecated.

Nevertheless, the remaining “FASTLOAD” and “BURST MEMORY-READ/WRITE” commands are generally useful for devices with a Fast/Burst Serial layer 2.

Settings API

There are several commands that change global device settings that appeared in later devices.

These are the commands supported on all devices since the 1571[^96]:

Name Syntax Description
USER U0>S val Set sector interleave
USER U0>R num Set number fo retries
USER U0>T Test ROM checksum
USER U0> pa Set unit primary address

And some commands supported on all devices since the 1581:

Name Syntax Description
USER U0>B flag Enable/disable Fast Serial
USER U0>V flag Enable/disable verify

CMD devices added the following commands:

Name Syntax Description
SWAP S-{8|9|D} Change primary address
GET DISKCHANGE G-D Query disk change (FD only)
SCSI COMMAND S-C scsi_dev_num buf_ptr_lp buf_ptr_hi num_bytes Send SCSI Command (HD only)

Real-Time Clock API

Some devices have a real-time clock that can be read and written in multiple formats.

Name Syntax Description
TIME READ ASCII T-RA Read Time/Date (ASCII)
TIME WRITE ASCII T-WA dow mo/da/yr hr:mi:se ampm Write Time/Date (ASCII)
TIME READ DECIMAL T-RD Read Time/Date (Decimal)
TIME WRITE DECIMAL T-WD b0 b1 b2 b3 b4 b5 b6 b7 Write Time/Date (Decimal)
TIME READ BCD T-RB Read Time/Date (BCD)
TIME WRITE BCD T-WB b0 b1 b2 b3 b4 b5 b6 b7 b8 Write Time/Date (BCD)
TIME READ ISO T-RI Read Time/Date (ISO)
TIME WRITE ISO T-WI yyyy-mm-ddThh:mm:ss dow Write Time/Date (ISO)

The ISO variant is only supported on the SD2IEC.

Family-Specific Features

There is a number of features that was only supported on one or a few devices and are not part of the canonical feature set.

1541 + 1571

For the 1541, the timing of the layer 2 Serial protocol was slowed down to support the C64’s unique timing properties. Since the 1541 replaced the 1540, it came with a mode to switch back to the faster VIC-20 Serial protocol16. This is only supported by the 1541 and 1571 series.

Name Syntax Description
USER UI{+|-} Use C64/VIC-20 Serial protocol

1571

The following two commands are 1571-specific:

Name Syntax Description
USER U0>M flag Enable/disable 1541 emulation mode
USER U0>H number Select head 0/1

1581

The 1581 has its own concept of “partitions” (which is also supported by CMD devices in 1581 emulation mode). A 1581 partition occupies a contiguous sequence of blocks and appears as a file of type CBM, but cannot be read or written as a file.

One use case for a 1581 partition is to reserve blocks for the block API that are safe from VALIDATE. But it is also possible to format the partition with a sub-filesystem17 (if the partition starts and ends at track boundaries and is at least 3 tracks in size). This way, partitions can even be nested.

Name Syntax Description
PARTITION /[medium][:name] Select partition
PARTITION /[medium]:name,track sector count_lo count_hi ,C Create partition

There is no syntax to access files in a different partition, it is only possible to change into partitions, and to change back to the root (by omitting the partition name).

C65

The disk drive built into the unreleased C65 supports the following additional commands:

Name Syntax Description
FILE LOCK F-L[path]:name[,…] Enable file write-protect
FILE UNLOCK F-U[path]:name[,…] Disable file write-protect
FILE RESTORE F-R[path]:name[,…] Restore a deleted file
BLOCK-STATUS B-S channel medium track sector Check if block is allocated
USER U0>Dval Set directory sector interleave
USER U0>Lflag Large REL file support on/off

Status Codes

Finally, here is some more information on the status codes and messages the unit sends through the command channel.

The first decimal digit encodes the category of the error.

First Digit Description
0, 1 No error, informational only
2 Physical disk error
3 Error parsing the command
4 Controller error (CMD addition)
5 Relative file related error
6 File error
7 Generic disk or device error
8, 9 unused

The full list of error messages can be found in practically every disk drive users manual, here are just some examples:

  • 00, OK,00,00: There was no error.
  • 01, FILES SCRATCHED,03,00: Informational: 3 files have been deleted (“scratched”).
  • 23,READ ERROR,18,00: There was a checksum error when trying to read track 18, sector 0.
  • 31,SYNTAX ERROR,00,00: The command sent was not understood.
  • 51,OVERFLOW IN RECORD,00,00: More data was written into a REL file record than fits.
  • 65,NO BLOCK,17,01: When trying to allocate a block using the B-A command, the given block was already allocated. Track 17, sector 1 is the next free block.
  • 66,ILLEGAL TRACK OR SECTOR,99,00: A user command referenced track 99, sector 00, which does not exist.
  • 73,CBM DOS V2.6 1541,00,00: This status is returned after the RESET of a device (and after the command “UI”). The actual message is specific to the device and can be used to detect the type and sometimes the ROM version18.

Note that the strings are meant for the user and not necessarily consistent between devices. A program should only ever interpret the status codes and its arguments.

Next Up

Part 4 of the series of articles on the Commodore Peripheral Bus family will cover Standard Serial (IEC).

This article series is an Open Source project. Corrections, clarifications and additions are highly appreciated. I will regularly update the articles from the repository at https://github.com/mist64/cbmbus_doc.

References


  1. No other literature calls it media. In the Commodore context, they are drives (because they are actual drives with their own mechanics), and in the CMD context, they are partitions (because they are parts of one large drive). I have decided to introduce a common name for the concept, since the syntax for paths and commands does not make a distinction between the two.

  2. The CMD RAMLink comes with an extra 64 KB of buffer RAM that can be read and written (to emulate the 1541/1571/1581 job queue), but the DOS runs on the computer’s CPU, so executing code on the device is not supported.

  3. Commodore DOS breaks the layer 3 convention in this case. An UNLISTEN event should not signal the termination of a byte stream, it should merely pause it.

  4. The two arguments are always 0-prefixed so they are at least two digits, but on devices with track or sector numbers above 99 (like the Commodore 8250 as well as CMD hard disks), they can be three digits.

  5. The SFD-1001 is the exception to this: It is a single-drive device, but it has the exact same firmware as the dual-drive CBM 8250.

  6. On classic 5,25″ devices, the user is supposed to make sure that disks have unique IDs. These devices store a copy of the ID with every sector on disk in order to detect disk changes more reliably.

  7. This is a limitation of the layer 2 protocol: It is impossible for a device to send a 0-byte stream of bytes.

  8. All single-drive Commodore devices except the 1571 (revision 5 ROM only), 1541-C, 1541-II and 1581 have a bug that can currupt the filesystem when using the overwrite feature.

  9. Again, this is because of a limitation of the layer 2 protocol when reading the file. In addition, all Commodore drives have a bug where a file that contains only one or two bytes will read back four bytes, i.e. with added garbage. CMD drives do not have this bug.

  10. The SFD-1001 and the 8250 are double-sided drives that can read and write single-sided disks (formatted by a 8050), but formatting a disk only supports double-sided mode. The same is true for the 1571 in native mode, which can read and write single-sided 1541 disks, but will always format double-sided disks – although the 1571 can format single-sided disks while in 1541 emulation mode.

  11. On disks that do not use Commodore’s native “GCR” bit encoding (e.g. CBM 8280, D9060/D9090, 1581, the C65 drive and all drives by CMD), the physical layout does not match the logical layout, i.e. the medium may have a different sector size or track/sector(/head) numbering. On the CMD HD, the track and sector numbers are interpreted as a linear block address, and the constraint of 255 tracks and 256 sectors of 256 bytes limits the maximum partition size to just under 16 MB.

  12. Many sources describe the “B-R” and “B-W” commands as buggy because their behavior didn’t seem to make sense and the explanation seemed to have been missing from common forms of documentation. Where they are documented, they are called the “random access files” commands, for a third type of file (next to sequential and relative) that was based on the user keeping track of allocation and linking, but using the “first byte holds block pointer” format provided by these commands.

  13. U1 and U2 were added in a firmware update to the Commodore 4040 drive because of bugs in B-R and B-W in version 2.1. They were probably added as USER commands as opposed to proper commands (or fixing the broken commands) in order to keep the changes to the new ROM version contained to one ROM chip. Then later drives retained this feature for compatibility.

  14. The feature has existed in all Commodore drives since the release of the 1540, but they only started documenting it with the 1551 drive, and never documented the actual file format or the algorithm for the required checksum. The 1540, early 1541 drives, the 8250/8050/4040 with DOS V2.7 as well as the D9060/D9090 hard disks supported the also undocumented “boot clip”: a device that grounds certain pins on the data connector and will force the unit to execute the first file on disk (“power-on diag sense loader”). All this hints at this mostly being a feature that was used in-house.

  15. The 1571 aimed to be perfectly backwards-compatible with the 1541, so in order to keep the ROM layout as close to the 1541’s as possible, all added commands were added as sub-commands to U0, which kept the new code contained behind a single vector.

  16. The CMD RAMLink is compatible with as much of Commodore DOS as possible, but cannot support the Burst API, because it is not connected through a Commodore Serial bus.

  17. The command was piggy-backed onto UI in order to keep the changes between the 1540 and the 1541 contained to the second ROM chip.

  18. Commodore calls them sub-directories, not to be confused with CMD-style subdirectories.

  19. The version is sometimes more of a compatibility level though and hints at the supported features. These strings are too inconsistent between devices for parsing, so in practice, the whole string has to be compared for detecting a particular device.

Commodore Peripheral Bus: Part 2: Bus Arbitration, TALK/LISTEN

In the series about the variants of the Commodore Peripheral Bus family, this article covers the common layer 3: the bus arbitration layer with the TALK/LISTEN protocol.

The variants of the Commodore Peripheral Bus family have some very different connectors and byte transfer protocols (layers 1 and 2), but they all share layers 3 and 4 of the protocol stack. This article on layer 3 is therefore valid for all Commodore 8 bit computers, no matter whether they use IEEE-488, Serial, TCBM or CBDOS on the underlying layers.

All variants of layer 2, the layer below, provide:

  • the transmission of byte streams from any bus participant to any set of bus participants
  • the transmission of “command” byte streams from the designated “contoller” (the computer) to all other bus participants

Layer 3, which is based on the IEEE-488 standard, provides interruptable communication between “channels” of different devices.


NOTE: I am releasing one part every week, at which time links will be added to the bullet points below. The articles will also be announced on my Twitter account @pagetable and my Mastodon account @pagetable@mastodon.social.


Controller

Layer 2 allows everyone on the bus to talk to everyone else – but there is no mechanism in place for who is sending or receiving data at what time. The primary feature of layer 3 is controlling exactly this.

One bus participant needs to be the designated controller – this is always the computer. It sends command byte streams to all other bus participants, the devices.

Primary Address

The controller needs to be able to address an individual device. Every device on the bus has a primary addresses from 0 to 30. The controller itself does not have an address.

Primary addresses (aka device numbers) are usually assigned through DIP switches (e.g. Commodore 1541-II: 8-11) or by cutting a trace (e.g. original Commodore 1541: 8 or 9).

On Commodore systems, there is a convention for device numbers:

# type
4, 5 printers
6, 7 plotters
8 – 11 disk drives, hard disks
12 – 30 some third party drives, misc.

Devices 0-3 are reserved for devices outside the Commodore Peripheral Bus, which share the same primary address space in the KERNAL’s Channel I/O API as well as in BASIC.

Note that this is just a convention and hints towards what protocol is used on layer 4, the layer above. On layer 3, neither computers nor devices care about this convention1.

Talkers and Listeners

In order to tell devices that they are now supposed to send or receive data, the controller hands out two roles: “talker” and “listener”.

  • A talker is sending a byte stream.
  • A listener is receiving a byte stream.

Any device can be either a talker, a listener, or passive. There can only be one talker at a time, and there has to be at least one listener.

The controller itself can also be the talker or a listener. In fact, in the most common cases, the controller is either the talker, with a disk drive as the single listener (e.g. writing a file), or the controller is the single listener, with a disk drive as the talker (e.g. reading a file).

TALK and LISTEN commands

To hand out the talker and listener roles to devices and to withdraw them, the controller sends a command byte stream containing one of the following codes:

command description effect
0x20 + pa LISTEN device pa becomes listener; ignored by other devices
0x3F UNLISTEN all devices stop listening
0x40 + pa TALK device pa becomes talker; all other devices stop talking
0x5F UNTALK all devices stop talking

For the LISTEN and TALK commands, the primary address of the device gets added to the code. The UNLISTEN and UNTALK commands correspond to the LISTEN and TALK with a primary address of 31. This is what restricts primary addresses to the range of 0-30.

All devices receive and interpret command bytes, so for example, a TALK command for device 8 will implicitly cause device 9 to stop talking, in case it currently was a talker.

A role change of the controller itself is not communicated through commands, since the controller already knows this (after all, it is the one making the decision), and the devices do not need to know.

Secondary Address

The designers of IEEE-488 felt that a device should have multiple functions or contexts, or that multiple actual devices could be sitting behind a single primary address. Each of these channels can be addressed using a secondary address from 0 to 31.

A command specifying the secondary address can optionally be sent after a TALK or UNTALK command.

command description effect
0x60 + sa SECOND last addressed device selects channel sa

The interpretation of the secondary address is up to the device and specified on layer 4. In practice, they are interpreted as options or flags (e.g. for printers) or different file contexts (e.g. for disk drives).

Devices are free to ignore the secondary address or only honor certain bits of it. Commodore disk drives, for example, ignore bit #4, so channels 16-31 are the same as channels 0-15.

Examples

Here are some examples for receiving, sending and copying, as well as for a controller-less connection.

Receiving Data from a Device

If the controller wants to read a byte stream from device 8, channel 2, it sends this:

command description
0x48 TALK 8
0x62 SECOND 2

The controller then becomes a listener and reads bytes from the bus. If the controller has had enough data, it can send this:

command description
0x5F UNTALK

The current talker will then release the bus. The controller can resume the transmission of data from the channel by sending the same TALK/SECOND commands again.

The controller has to stop receiving bytes onces it encounters the end of the stream (EOI). There is no need to send UNTALK in this case, since the talker will automatically release the bus.

Sending Data to a Device

Here is an example that sends a byte stream to device 4, channel 7:

command description
0x24 LISTEN 4
0x67 SECOND 7

The controller then sends the byte stream. Like in the case of receiving data, the controller can pause transmission like this:

command description
0x3F UNLISTEN

and resume it using the same LISTEN/SECOND combination2. If the controller has reached the end of its byte stream, it signals EOI. Again, there is no need to send UNLISTEN in this case.

Manually Copying Data Between Devices

The following example has the controller manually copy a byte stream from device 8, channel 2 (a disk drive) to device 4 (a printer). First, it tells device 8, channel 2 to talk:

command description
0x48 TALK 8
0x62 SECOND 2

Now the controller reads a byte from the bus. It then instructs the talker to stop talking and tells device 4 to listen:

command description
0x5F UNTALK
0x24 LISTEN 4

In this case, there is no secondary address for device 4, so the device picks its default channel. The controller then sends the byte it just read back onto the bus and tells device 4 to stop listening.

command description
0x3F UNLISTEN

Now it can repeat the whole procedure from the start, until the read operation signals the end of the stream.

Obviously this is slow, because it transmits 7 bytes for every byte of payload. A more optimized version would read and write something like 256 bytes at a time.

Having Devices Talk to Each Other

But devices can also talk directly to each other, without the controller’s involvement. This way, data only travels over the bus once.

This command byte stream will instruct devices 4 and 5 (two printers) to listen and device 8, channel 2 (a disk drive) to talk. After the transmission of the command, device 8 will automonously send whatever data it can provide from channel 2 to devices 4 and 5.

command description
0x24 LISTEN 4
0x25 LISTEN 5
0x48 TALK 8
0x62 SECOND 2

Device 8 now starts sending bytes, and devices 4 and 5 will receive them. The layer 2 protocol makes sure that the listeners will adapt to the talker’s speed and wait patiently when it stalls (e.g. the disk drive has to read a new sector), and the talker will adapt to the speed of the slowest listener and wait patiently when any of them stalls (e.g. when the printer has to feed the paper).

The controller can interrupt the transmission at any time by sending new commands. In can, for example, read from a different channel of the disk drive, and then resume the print job by sending the above command sequence again.

If the controller wants to know when the transmission is finished, it will have to be a listener as well and detect the end of the stream (EOI).

Named Channels (OPEN/CLOSE)

When Commodore chose IEEE-488 as the protocol stack for the PET, they felt that a numeric secondary address from 0 to 31 was not expressive enough for the different contexts of e.g. a disk drive, so they added named channels.

The controller can associate a secondary address with a name, and later dissociate it again. A name is a byte stream of arbitrary length (including zero bytes) and usually the PETSCII encoding is implied. Only secondary addresses in the range of 0-15 can be associated with a name, 16-31 cannot.

command description effect
0xE0 + sa CLOSE dissociate sa from its name
0xF0 + sa OPEN associate sa with a name

Both commands have to be prefixed with a LISTEN command to address the correct device. An OPEN command sequence that associates a name with channel 2 on device 8 looks like this:

command description
0x28 LISTEN 8
0xF2 OPEN 2

The controller now sends the name of the channel, followed by UNLISTEN:

command description
0x3F UNLISTEN

Unlike regular data transmissions, where the controller can pause and resume the stream using UNLISTEN/LISTEN, the name of the channel has to be sent in one go. The end of the name is indicated by the UNLISTEN command, not by EOI.

The device can indicate an error associating the channel with the name3 using an error condition on layer 2.

Dissociating a channel from a name is done using the sequence LISTEN/CLOSE/UNLISTEN, like in this example:

command description
0x28 LISTEN 8
0xE2 CLOSE 2
0x3F UNLISTEN

Compatibility with IEEE-488

The bus arbitration layer of the Commodore Peripheral Bus is based on and mostly compatible with the IEEE-488 specification, but with additions and some missing features.

Added Features

As mentioned earlier, OPEN and CLOSE are a Commodore extension. Commodore added it in a clever way that didn’t clash with any features of the IEEE-488 specification.

In IEEE-488, all command codes have the same bit layout:

bit description
7 ignored
6 – 5 command code
4 – 0 primary or secondary address

This allows for 4 command codes, 30 primary addresses and 31 secondary addresses. This is the complete command table on Commodore devices:

command binary description
0x00 + cmd 000nnnnn (global command)
0x20 + pa 001nnnnn LISTEN
0x40 + pa 010nnnnn TALK
0x60 + sa 011nnnnn SECOND
0xE0 + sa 1110nnnn CLOSE
0xF0 + sa 1111nnnn OPEN

The codes for CLOSE (0xE0) and OPEN (0xF0) reuse the code for SECOND (0x60), but with bit #7 set. To devices that don’t understand the Commodore OPEN/CLOSE protocol, these commands look like SECOND and will be ignored, since they are sent after a LISTEN command that targets a different device.

Bit 4 of the command, the most significant bit of the secondary address, is used to distinguish between OPEN and CLOSE, which is why it is only possible to associate 16 secondary addresses with a name.

Missing Features

There is one unsupported entry in the table above: The command code of ‘000’ has a sub-code in bits 0-4, specifying a global command to all devices. These control features like the handling of “SRQ” Service Requests and multiple controller support. The system software of Commodore computers does not support any of these features, but on a PET/CBM with a physical IEEE-488 port, support could be added by user software.

APIs

The KERNAL operating system of all Commodore 8 bit computers since the VIC-20 (i.e. also the C64/C128/C65, the Plus/4 Series and the CBM-II) supports two sets of APIs to talk to devices on the Commodore Peripheral Bus. The built-in BASIC interpreter also has instructions to handle the bus.

KERNAL IEEE API

The “IEEE” API is a set of low-level calls. It allows using primary addresses 0-3, which are not available through the high-level APIs.

address name description arguments
$FFB1 LISTEN Send LISTEN command A = pa
$FFAE UNLSN Send UNLISTEN command
$FF93 SECOND Send LISTEN secondary address A = 0x60 + sa
$FFB4 TALK Send TALK command A = pa
$FFAB UNTLK Send UNTALK command
$FF96 TKSA Send TALK secondary address A = 0x60 + sa
$FFA5 ACPTR Read byte from serial bus byteA
$FFA8 CIOUT Send byte to serial bus A = byte
$FFA2 SETTMO Set timeout A = { 0x00 | 0x80 }

Note the difference between SECOND to send a secondary address after LISTEN, and TKSA to send a secondary address after TALK. Layer 2 generally needs this distinction to get the bus into the correct state afterwards.

All calls deal with layer 3 functionality, except for SETTMO, which controls a layer 2 setting. The IEEE-488 layer 2 on the PET/CBM has an option to enable (A = 0x00, default) or disable (A = 0x80) timeouts. Timeouts are required to allow the talker to communicate an error when opening a named channel, but they can break IEEE-488 devices not designed for the PET. The call also exists on all other Commodore 8 bit computers, but has no effect, with the exception of a C64 with an added IEEE-488 cartridge.

KERNAL Channel I/O API

The KERNAL’s Channel I/O API is higher-level and not specific to the Commodore Peripheral Bus. Devices 0-3 will target the keyboard, tape, RS-232 (PET: tape #2) and the screen. This API does not support multiple listeners or controller-less transmissions (but it can be combined with the low-level API for this).

address name description arguments
$FFB7 READST Read I/O status word stA
$FFBA SETLFS Set logical, first, and second addresses A = lfn, X = pa, Y = sa
$FFBD SETNAM Set file name A = len, X/Y = name
$FFC0 OPEN Open a logical file
$FFC3 CLOSE Close a specified logical file A = lfn
$FFC6 CHKIN Open channel for input X = lfn
$FFC9 CHKOUT Open channel for output X = lfn
$FFCC CLRCHN Close input and output channels
$FFCF CHRIN Input character from channel byteA
$FFD2 CHROUT Output character to channel A = byte
$FFE7 CLALL Close all channels and files

Channel I/O allows up to 10 logical files open at the same time, across all devices. A logical file is addressed by a user-selected logical file number (0-127). To open a logical file, the logical file number and device’s primary and secondary addresses (255 = none) have to be set using SETLFS, the name has to be set using SETNAM, and OPEN hast to be called.

OPEN with a filename will send the LISTEN/OPEN/filename/UNLISTEN sequence, associating the name with the secondary address. OPEN without a filename will not send anything on the bus, but will remember the secondary address for later operations.

Similary, CLOSE on a file with a filename will send the LISTEN/CLOSE/UNLISTEN sequence, and otherwise, CLOSE will not send anything on the bus.

The current input and/or output has to be globally selected using CHKIN, which will send TALK, and CHKOUT, which will send LISTEN. Both are followed by SECOND, if a secondary address was set. CLRCHN resets the current input and output channels and sends UNTALK or UNLISTEN.

With CHKIN/CHKOUT set up to talk on the Commodore Peripheral Bus, CHRIN and CHROUT will just be forwarded to the low-level calls ACPTR and CIOUT.

BASIC API

The complete channel I/O API is directly accessible through BASIC instructions:

command description
OPEN lfn, pa [, sa [, name]] open logical file
CLOSE lfn close logical file
GET# lfn, var read character
INPUT# lfn, var read string/int/float
PRINT# lfn, var [, …] write string/int/float
CMD lfn redirect standard out

Note that every GET#, INPUT# and PRINT# instruction will go through a TALK/UNTALK or LISTEN/UNLISTEN sequence.

Next Up

Part 3 of the series of articles on the Commodore Peripheral Bus family will cover Layer 4: Commodore DOS.

This article series is an Open Source project. Corrections, clarifications and additions are highly appreciated. I will regularly update the articles from the repository at https://github.com/mist64/cbmbus_doc.

References

  • Fisher, E. R., & Jensen, C. W.: PET and the IEEE 488 Bus (GPIB). Berkeley, Calif: OSBORNE/McGraw-Hill, 1982. ISBN 0-931988-31-4.
  • Keller, R. & Hurling H.: IEC-Bus – im Labor bewährt. in: c’t Magazin für Computer und Technik, 9/87, p. 187-192. ISSN 0724-8679.
  • Derogee, J. & Butterfield, J.: IEC disected. 2008.
  • Commodore 64 Programmer’s Reference Guide. [S.l.]: Commodore Business Machines, 1987. ISBN: 0672220563
  • cbmsrc – Original source code of various Commodore computers and peripherals

  1. It is possible to change the primary address of a Commodore 1541 using a Commodore DOS (layer 4) command, with o as the old and n as the new address:
    o=8:n=4:oP15,o,15:pR15,"m-w";cH(119);cH(0);cH(2);cH(n+32)+cH(n+64):clO15
    It is no problem to change the primary address to 4, the default address of the printer, and still interact with it using BASIC commands for disk access: load"$",4

  2. Commodore DOS breaks this convention in one case: When a disk drive receives a command string on channel 15, it will execute it as soon as there is an UNLISTEN event, as opposed to only triggering on EOI.

  3. For disk drives, this happens when layer 4 decides that a file was not found or there was no disk in the drive, for example.

Commodore Peripheral Bus: Part 1: IEEE-488

In the series about the variants of the Commodore Peripheral Bus family, this article covers the lowest two layers (electrical and byte transfer) of the IEEE-488 bus as found on the PET/CBM series.


NOTE: I am releasing one part every week, at which time links will be added to the bullet points below. The articles will also be announced on my Twitter account @pagetable and my Mastodon account @pagetable@mastodon.social.


History

Most computers from the early time of home/personal computers had printers connected through either Centronics or RS-232 and disk drives connected directly to the internal bus. For the 1977 PET, Commodore decided to go with a standard interface that would allow to connect several drives and printers to a single port: IEEE-488.

The interface standardized as IEEE-488 in the US and as IEC-625 internationally is also known as the Hewlett-Packard Interface Bus (“HP-IB”) and was originally designed in the late 1960s. It was popular mostly for test equipment, but also for printers.

IEEE-488 is a 8 bit parallel bus with the following properties:

  • All participants are daisy-chained.
  • One dedicated controller (the computer) does bus arbitration of up to 31 devices.
  • One-to-many: Any participant can send data to any set of participants.
  • A device has multiple channels for different functions.
  • Data transmission is byte stream based.

This article covers layers 1 (electrical) and 2 (byte transfer) of IEEE-488 from the PET’s perspective. Differences to the standard are mentioned at the end.

Layer 1: Electrical

Connectors and Pinout

On the computer side, the PET uses a proprietary 24 pin edge connector:

All devices use the standardized IEEE-488 connector, which is a 24 pin micro ribbon connector:

Devices usually only have a single micro ribbon connector, but the connectors of common IEEE-488 cables have one male and one female side, so the back side of every cable connector becomes the connector to the next device.

IEEE-488 cable
IEEE-488 double connector
IEEE-488 male connector
IEEE-488 female connector

The pinout is similar across the two connectors:

Pin Signal Description Pin Signal Description
1 DIO1 Data I/O 13/A DIO5 Data I/O
2 DIO2 Data I/O 14/B DIO6 Data I/O
3 DIO3 Data I/O 15/C DIO7 Data I/O
4 DIO4 Data I/O 16/D DIO8 Data I/O
5 EOI End Or Identify 17/E REN Remote Enable
6 DAV Data Valid 18/F GND
7 NRFD Not Ready For Data 19/H GND
8 NDAC No Data Accepted 20/J GND
9 IFC Interface Clear 21/K GND
10 SRQ Service Request 22/L GND
11 ATN Attention 23/M GND
12 SHIELD 24/N GND
  • The eight DIO lines carry the data bytes.
  • The NRFD, DAV and NDAC lines are used to perform handshaking.
  • EOI, ATN, SRC and REN are control lines.
  • IFC is the RESET line for all devices.

Open Collector Logic

All signal lines are TTL open collector, which means:

  • All participants of the bus can not only read, but also write to the line.
  • When all participants write 0, the line will read back 0, but if any device writes 1, the bus will read back as 1.
  • The logic is inverted: 5V is 0 (false), and 0V is 1 (true).

In other words: If the line is released by all bus participants, it will be 5V (logically 0), and any participant can pull it to 0V (logically 1).

This can be visualized with two (or more) hands that can pull the line to 1, and a spring that pushes it to 0:

So when a line reads as 0, it is known that it is currently released by all participants, but if a line reads as 1, it is impossible to know who or even how many are currently pulling it.

Layer 2: Byte Transfer

The basic byte transfer protocol of IEEE-488 is based on transmissions of byte streams from one sender to one or more receivers. Additional bus participants will remain silent. There are no fixed assignments of senders and receivers, the roles of sender and receiver are per transmission.

Sending Bytes

When transmitting data, the sender operates the DIO (Data I/O) and DAV (Data Valid) lines, while the receivers operate the NRFD (Not Ready For Data) and NDAC (Not Data Accepted) lines. All bus participants that don’t operate a line leave it released.

The basic idea is that the receivers signal that they are ready for data (NRFD = 0), the sender puts the data on the bus (DIO) and signals that the data is valid (DAV), and the receivers signal that they have accepted the data (NDAC = 0). This is called the 3-wire-handshake.

The following animation shows a byte being sent to two receivers.

Let’s go through it step by step:

0: Initial State: Receivers are busy


In the initial state, the sender has the 8 data bits set to 0 and DAV to 0, meaning that there is no data available. All receivers have NDAC pulled, meaning that no data was accepted (yet). They may also have NRFD (not ready for data) pulled while they are busy doing other things.

1: A is now ready to receive data


Transmission of a byte cannot begin until all receivers are ready to receive. So at some point the first receiver is done handling the previous byte it may have received and signals that it is ready for data by releasing NRFD. The wire NRFD is still pulled by the other receiver though, so its value is still 1.

2: All receivers are now ready to receive data


Whenever the other receiver is ready to receive the next byte, it will also release NRFD, so it will now read back as 0: All receivers are ready to receive data.

3: Sender puts data on the bus


Triggered by NRFD being 0, the sender now puts the byte value onto DIO.

4: Data on bus is now valid


After that, the sender pulls DAV, signaling that the data in DIO is valid.

5: A is now busy again


Triggered by DAV being 1, the receivers first have to signal that they are busy, so that after accepting the data, the sender won’t think the receivers are immediately ready for the next byte. So now, the first receiver pulls NRFD, so NRFD is 1.

6: A has accepted the data


Then, the first receiver reads the data from DIO and releases NDAC, signaling that it has accepted the data. But the other receiver is still pulling it, so NDAC is still 1.

7: B is now busy


Before it can accept the data, the other receiver also has to signal that it is busy by pulling NRFD. The line was already 1 and will stay at 1.

8: All receivers have accepted the data


Then the other receiver reads the data from DIO and releases NDAC. NDAC is now 0, meaning all receivers have accepted the data.

9: Data on the bus is no longer valid


Triggered by NDAC being 0, the sender sents DAV back to 0, meaning the data in DIO is no longer valid.

10: Sender clears data on the bus


In order to revert to the initial state, the sender then clears the byte from the DIO lines and sets them back to 0.

11: A resets data accepted


Triggered by DAV being 0, the first receiver pulls NDAC.

12: B resets data accepted


Likewise, the other receiver pulls NDAC. All wires are now in the initial state again. All steps are repeated as long as there is more data to be sent.

Note that the protocol only specifies the triggers: For example, the receivers are to read the data from DIO once DAV = 1, so it would be just as legal for the the sender to put the data on DIO as early as step 1 (the PET does this, Commodore disk drives don’t), or combining the DAV and DIO writes (3/4 and 9/10) in one step.

Also, there is no ordering on which receiver pulls or releases its line first. The receivers don’t care about the other receivers, they only follow the protocol with the sender. The open collector property of the signal lines automatically combines the outputs of the different receivers.

End of Stream

If there is no more data to be transmitted, the sequence stops at step 12 (which is the same as step 0). It is the sender that decides whether it wants to transmit more data, but it is the receivers taking the first step for the next byte (NRFD = 0). Therefore, the sender already signals the end of the stream to the receivers while transmitting the last byte. It does this by pulling the EOI (“End Or Identtfy”) line to 1 while it is pulling DAV to 1.

As a side effect of this, IEEE-488 does not allow empty streams – they would have to be at least one byte long. Commodore’s version of the bus uses timeouts to signal this condition (see below).

Sending Commands

The assignment of senders and receivers to transmissions is the job of layer 3 (Bus Arbitration), described in the next article in this series.

But there are also command transmissions, where one particiant can start a transmission to all other participants at any time.

Only so-called “controllers” may perform a command transmission, and on Commodore busses, there is always only one controller: the computer. All bus participants that are not controllers are called “devices”.

When the controller wants to send a command, it pulls the ATN (“Attention”) line. All devices on the bus have to immediately (i.e. within less than a clock cycle – Commodore drives do this using a hardware circuit) respond by pulling NDAC (“ATN Response Timing”), and participate in the 3-wire-handshake to receive the command byte stream.

The controller sends the command data like any other transmission, and releases ATN afterwards. It does not pull EOI during the transmission of the last byte, since the release of ATN signals the end of the stream already.

The encoding of commands is part of the layer 3 bus arbitrarion protocol (see Part 2).

Errors

When the sender wants to start the transmission of a byte, the bus must be in one of these two states:

  • NRFD = 1, NDAC = X: At least one receiver is still busy, so wait until everyone is ready for data.
  • NRFD = 0, NDAC = 1: All receivers are ready for data.

If NRFD = 0 and NDAC = 0, this means that there are no receivers present:

  • There are either no other participants connected at all, so nobody can pull NRFD or NDAC,
  • or there are participants connected, but they don’t want to receive, so they aren’t pulling NRFD or NDAC.

Just from the signals, receivers cannot detect whether there is a sender though. When trying to receive the first byte, the receivers signal their readiness with NRFD = 0. Then, as long as DAV = 0, it means the sender doesn’t have new data yet, and as soon as DAV = 1, it means the sender has data. The single line owned by the sender cannot communicate that there is nobody willing to send. A timing sidechannel (i.e timeouts) can though.

Timeouts

On the PET side, the Commodore version of IEEE-488 implements timeouts. By default, there are two steps that can time out:

  • receiver timeout: After the PET as the sender signals that the data is valid by pulling DAV, the receivers have to release NDAC, i.e. accept the data within 64 µs.
  • sender timeout: After the PET as the receiver signals that it is ready for data by releasing NRFD, the sender has to pull DAV, i.e. provide data within 64 µs.

The receiver timing requirement can easily be met if the receiver is waiting for the next byte in a tight loop and doesn’t have interrupts enabled. All it has to do is take the byte, buffer it, and release NDAC. It then has time to process it.

The sender timeout is problematic for two reasons:

  • Since it is the receiver that starts the transmission of a byte, and the sender that has to react within 64 µs, there might not be enough time for the sender to actually retrieve the next byte e.g. from disk. Therefore, the sender is required to retrieve the byte during the transmission of the previous byte, e.g. while during DAV = 1, which cannot time out.
  • Also, the timer starts running as soon as the PET releases NRFD. In theory, there could be more receivers next to the PET that have to release NRFD as well before the sender is even allowed to send the data. So this timeout will easily break this case. (This is fixable and could be considered an implementation bug.)

Mostly for compatibility with non-Commodore IEEE-488 devices (and also fixing the latter problem), timeouts can be disabled globally using the KERNAL call “SETTMO” ($FFA2) with A = $80.

In addition to being able to recover from malfunctioning devices, timeouts are also an implicit communication channel: If a sender is asked to perform a transmission but doesn’t actually have any data, it will signal this by timing out. Therefore, turning off the timeout will break the “FILE NOT FOUND” case for Commodore drives on layer 3 (see the next article in this series).

Timing

Apart from ATN Response Timing and the two steps that have timeouts, IEEE-488 on the PET is has no timing constraints. A sender can stall for an arbitrary amount of time (in the DAV = 1 phase, step 4), e.g. in the case of a disk drive fetching data from disk, and a receiver can stall for an arbitrary amount of time (in the NRFD = 1 phase, step 0), e.g. in the case of a printer that is out of paper.

Additionally, the open collector property of the handshake lines allows the slowest receiver on the bus to control the transmission speed.

Standards-Compliance

The PET version of IEEE-488 is very close to the original specification of the bus and can therefore support most standard equipment.

On the hardware side, all signals are accessible, with the exception of REN (“Remote Enable”), which is connected to ground (logically true), and would otherwise have allowed devices to work standalone.

On the software side, there is no support for the SRQ (“Service Request”) line, which is supposed to allow a device to signal that it wants to be serviced. This is basically an IRQ line. While the signal is accessible from the PET, there is no software support, and no Commodore devices make use of it.

And as mentioned above, the PET uses timeouts by default, which can be disabled.

Next Up

Part 2 of the series of articles on the Commodore Peripheral Bus family will cover Layer 3: Bus Arbitration.

This article series is an Open Source project. Corrections, clarifications and additions are highly appreciated. I will regularly update the articles from the repository at https://github.com/mist64/cbmbus_doc.

References

  • Fisher, E. R., & Jensen, C. W.: PET and the IEEE 488 Bus (GPIB). Berkeley, Calif: OSBORNE/McGraw-Hill, 1982. ISBN 0-931988-31-4.
  • Keller, R. & Hurling H.: IEC-Bus – im Labor bewährt. in: c’t Magazin für Computer und Technik, 9/87, p. 187-192. ISSN 0724-8679.
  • PETdoc.txt
  • cbmsrc – Original source code of various Commodore computers and peripherals

Commodore Peripheral Bus: Overview

The well-known Serial Bus (aka Serial “IEC” Bus) of the Commodore 64 that connects to disk drives such as the 1541 is just one variant of a whole family of busses and protocols used by the line of 8 bit Commodore machines from the PET to the C65. This is the first article of a multi-part series on the Commodore Peripheral Bus family.

The following figure compares the different protocol stacks. There are three different connectors with their unique byte transfer protocols: IEEE-488, Serial, and TCBM. Fast Serial and JiffyDOS are optimized, but backwards-compatible protocols for existing cables, and CBDOS integrates the drive directly into the computer. The higher level protocols are the same across all Commodore 8 bit machines, including the “KERNAL” operating system APIs.

These are the properties and tradeoffs of the different variants of the protocol stack1:

IEEE-488 Serial Fast Serial JiffyDOS TCBM CBDOS
Data Wires 13 3 4 3 12
Speed (KB/sec) 2.1 0.4 2.1 2.1 2.4
Controller Code (bytes) 334 434 708 739 262 0
Comments compatible with industry standard very slow requires decidated bit shifting hardware point-to-point drive integrated into computer

All variants are based on the IEEE-488 standard and therefore share (mostly) the same basic architecture:
* All participants are daisy-chained.
* One dedicated controller (the computer) does bus arbitration of up to 31 devices.
* One-to-many: Any participant can send data to any set of participants.
* A device has multiple channels for different functions.
* Data transmission is byte stream based.

The different variants and layers will be described in multiple articles.


NOTE: I am releasing one part every week, at which time links will be added to the bullet points below. The articles will also be announced on my Twitter account @pagetable and my Mastodon account @pagetable@mastodon.social.


  • Part 0: Overview and Introduction
    That’s this part.
  • Part 1: IEEE-488 [PET/CBM Series; 1977]
    This part covers layers 1 (electrical) and 2 (byte transfer) of IEEE-488, an 8-bit parallel bus with three handshake lines, an ATN line for bus arbitration and very relaxed timing requirements.
  • Part 2: The TALK/LISTEN Layer
    This part talks about layer 3 (TALK/LISTEN), which is shared between all bus variants.
  • Part 3: The Commodore DOS Layer
    This part describes layer 4 (Commodore DOS), which is shared between all bus variants.
  • Part 4: Standard Serial (IEC) [VIC-20, C64; 1981] (coming soon)
    The VIC-20 introduced a serial version of layers 1 and 2 with one clock and one data line for serial data transmission, and an ATN line for bus arbitration. It has some strict timing requirements. This bus is supported by all members of the home computer line: VIC-20, C64, Plus/4 Series, C128 and C65.
  • Part 5: TCBM [C16, C116, Plus/4; 1984] (coming soon)
    The Plus/4 Series introduced a 1-to-1 bus between the computer and one drive, with 8 bit parallel data, two handshake lines, and two status lines from the drive to the computer. It was the short-lived planned successor of the Standard Serial bus, but was then replaced by Fast Serial.
  • Part 6: JiffyDOS [1985] (coming soon)
    JiffyDOS, a 3rd party ROM patch for computers and drives, replaces layer 2 byte transmission of Standard Serial by using the clock and data lines in a more efficient way. Bus arbitration is unchanged. The controller detects a device’s JiffyDOS support and can fall back to the Standard Serial protocol.
  • Part 7: Fast Serial [C128; 1986] (coming soon)
    The C128 introduced Fast Serial, which replaces layer 2 byte transmission of Standard Serial by using a previously unused wire in the Serial connector as a third line for data transmission. Bus arbitration is unchanged. The controller detects a device’s Fast Serial support and can fall back to the Standard Serial protocol.
  • Part 8: CBDOS [C65; 1991] (coming soon)
    The unreleased C65 added CBDOS (“computer-based DOS”) by integrating one or more drive controllers into the computer. There are no layers 1 and 2, and layer 3 sits directly on top of function calls that call into the DOS code running on the same CPU.

This article series is an Open Source project. Corrections, clarifications and additions are highly appreciated. I will regularly update the articles from the repository at https://github.com/mist64/cbmbus_doc.


  1. The speeds have been measured by repeatedly reading the status channel of a disk drive. IEEE-488, Serial and JiffyDOS were measured on a 1 MHz C64 and Fast Serial on a C128, which executes all (Fast) Serial code in 1 MHz mode. TCBM was measured on a 1.77 MHz Plus/4 with the screen on, which makes the effective CPU speed similar to the C64. This is the code: a9 00 20 bd ff a9 01 a2 08 a0 0f 20 ba ff 20 c0 ff a9 08 20 b4 ff a9 6f 20 96 ff a2 00 20 a5 ff 9d 00 04 e8 d0 f7 60. Both Fast Serial and JiffyDOS can reach higher speeds in the special case of loading files using custom protocols. Controller code size was measured on CBM2 for IEEE-488, on C64 for Serial and JiffyDOS, and on C128 for Fast Serial. Code sizes are approximate and do not include the LOAD and SAVE code.

Archiving C64 Tapes Correctly

It’s pretty simple to archive Commodore 64 tapes, but it’s hard if you want to do it right. Creating the complete archive of the German “INPUT 64” magazine was not as easy as getting one copy of each of the 32 tapes and reading them. The tapes are over 30 years old by now, and many of them are hardly readable any more.

Good Tapes, Bad Tapes

Here is the overview of the tapes I have of each issue, how many I could read correctly, and what percentage that is:

Issue # Copies # Read OK % Read OK
8501 8 2 25%
8502 6 1 16.7%
8503 8 1 12.5%
8504 6 0 0%
8505 6 1 16.7%
8506 6 1 16.7%
8507 6 0 0%
8508 6 1 16.7%
8509 6 5 83.3%
8510 6 0 0%
8511 6 3 50%
8512 10 3 33.3%
8601 4 3 75%
8602 4 2 50%
8603 4 3 75%
8604 4 4 100%
8605 6 3 50%
8606 4 2 50%
8607 4 4 100%
8608 4 3 75%
8609 6 4 66.7%
8610 6 5 83.3%
8611 2 1 50%
8612 2 1 50%
8701 2 1 50%
8702 2 2 100%
8703 2 2 100%
8704 2 2 100%
8705 2 1 50%
8706 4 1 25%
8707 6 0 0%
8708 4 2 50%

As you can see, some tapes are pretty much unreadable these days, others are okay. For issues that were problematic, I kept buying more copies on eBay, hoping to eventually find one that reads correctly. (Interestingly, the distribution suggests that correlates more with the brand of tape (8707 seems to be a very bad one) than with age.)

By the way, all numbers of copies are divisible by two, because all INPUT 64 tapes have an identical copy of the data on the reverse side. So for issue 8702, for example, I only had a single tape, and both sides read correctly.

How to Dump Tapes

There are many ways to dump C64 tapes, and I’ll describe two.

If you have a C64 and an actual Datasette reader, you can use the 1541 Ultimate-II+ cartridge. The included dongle allows you to connect the Datasette to the cartridge, so you can directly record a tape onto a .TAP file on a USB storage device also connected to the cartridge.

If you are doing this, make sure that your Datasette has a clean head and is correctly aligned. HeadAlign by enthusi is very useful for this. If you have multiple Datasette recorders, try them all and start out with the best one before changing the alignment.

Another way is to record the tape into a WAV file using a regular tape recorder. Online shops are full of very cheap (but good) Walkman-like devices with a built-in analog-digital-converter that connect directly to USB and show up as an audio-in device. Using a tool like Audacity and one of the many WAV-to-TAP tools (my favorite is the ancient “tape64”, whose Windows binary works nicely with Wine), you can then convert it into a TAP file.

The advantage of this method is that you can still massage dumps of tapes that you have trouble with. In Audacity, you can use filters or split the two channels, and in conversion tools, you can adjust the threshold or the speed.

Interestingly, some tapes read well on a tape player, others read well on a Datasette. If you have trouble getting a correct dump and few copies, you might want to try different methods of reading the tapes.

Checksums

How do I know whether a tape has in fact been read correctly? Practically all recording formats use checksums, so a tool like the excellent tapclean can help:

$ tapclean -t 8702a.tap 

----------------------------------------------------------------------
TAPClean 0.34 - (C) 2006-2017 TC Team [Built Jun 12 2017 by ldf]
Based on Final TAP 2.76 Console - (C) 2001-2006 Subchrist Software
----------------------------------------------------------------------

Read tolerance = 10

Computer type: C64 PAL (985248 Hz)


Loaded: 8702a.tap
Testing...

Scanning...  Pauses  C64 ROM tape

TAPClean version: 0.34

GENERAL INFO AND TEST RESULTS

TAP Name    : 8702a.tap
TAP Size    : 1221451 bytes (1192 kB)
TAP Version : 1
Recognized  : 94%
Data Files  : 28
Pauses      : 186
Gaps        : 204
Magic CRC32 : A9F18B1C
TAP Time    : 8:44.16
Bootable    : YES (1 part, name: INPUT 64)
Loader ID   : n/a

Overall Result    : FAIL

Header test       : PASS [Sig: OK] [Ver: OK] [Siz: OK]
Recognition test  : FAIL [1153201 of 1221431 bytes accounted for] [94%]
Checksum test     : PASS [28 of 28 checksummed files OK]
Read test         : PASS [0 Errors]
Optimization test : FAIL [0 of 28 files OK]

Saved: tcreport.txt
Operation completed in 00:00:09.

The tool recognized 28 pieces of data on the tape, and all of them had correct checksums. That’s a good sign, but not good enough. And that’s not just because single-byte checksums might be too weak to rely on.

Completeness

It is possible that some pieces of data did not get recognized. If a header is unreadable, tapclean will treat it as an unrecognized area and warn about it. In the printout above, the tape failed the “recognition test”, because it only understood 94% of the tape (which includes silence). What are the other 6%?

Tapes often contain some garbage, and many INPUT 64 tapes have a minute of beeps at the end that don’t seem to encode any data. So these 6% may not be a problem.

Another data point on whether all data objects got recognized is by extracting everything and having a look at the listing:

$ tapclean -t 8702a.tap -doprg
$ ls prg/
007 (033C-03FB) [INPUT_64].prg
008 (033C-03FB) [INPUT_64].prg
010 (0318-09FF).prg
011 (0318-09FF).prg
013 (033C-0354) [CTEXTE].prg
017 (3000-3FBA).prg
021 (033C-0354) [RAHMEN_______COM].prg
025 (C440-CFF4).prg
029 (033C-0354) [TITELBILD].prg
033 (0801-1891).prg
037 (033C-0354) [l_O_H_N_S_T_E_U].prg
041 (0801-6BCE).prg
045 (033C-0354) [j_u_l_i_a].prg
049 (0801-4B99).prg
052 (033C-0354) [i_d___w_E_R_K_S].prg
056 (0801-3F16).prg
060 (033C-0354) [6_4_E_R__t_I_P_S].prg
064 (0801-59CE).prg
068 (033C-0354) [i_n_p_u_t___c_a].prg
072 (0801-2E7C).prg
076 (033C-0354) [l_A_B_E_L___t_O].prg
080 (0801-2AEC).prg
084 (033C-0354) [d_R_E_I___M_A_L].prg
088 (0801-5538).prg
091 (033C-0354) [eNGLISCHE_gramMA].prg
095 (0801-5B22).prg
098 (033C-0354) [v_O_R_S_C_H_A_U].prg
102 (0801-2A94).prg

Commercial Commodore 64 tapes usually don’t use the original encoding as supported by the C64’s operating system. More optimized schemes are often 5x-10x more efficient. Nevertheless, the first program on tape has to use the original encoding, so that the tape is bootable.

This tape contains a bootable program named “INPUT 64” at the beginning. The Commodore encoding saves the 192 bytes header (file name, type etc.) twice (007, 008), followed by the data, which is also saved twice (010, 011). On this tape, everything after this is in “SUPERTAPE” format. Every SUPERTAPE file consists of a single header (e.g. 013 for “CTEXTE”) and a single copy of the data (e.g. 017).

The printout above shows that after the boot program (2x header, 2x payload), there are 12 additional files. For each file, there is a header and there is payload. So this looks okay. Here is an example of a missing object:

$ tapclean -t 8707a.tap -doprg
[...]
Checksum test     : PASS [29 of 29 checksummed files OK]
[...]
$ ls prg/
066 (033C-03FB) [INPUT_64].prg
067 (033C-03FB) [INPUT_64].prg
069 (0318-09FF).prg
070 (0318-09FF).prg
072 (033C-0354) [CTEXTE].prg
076 (3000-4093).prg
080 (033C-0354) [RAHMEN_______COM].prg
084 (C440-CFF2).prg
088 (033C-0354) [TITELBILD].prg
092 (0801-1891).prg
096 (033C-0354) [i_n_p_u_t___w_I].prg
100 (0801-4A1D).prg
103 (033C-0354) [eNGLISCHE_gramMA].prg
107 (0801-5E09).prg
111 (033C-0354) [s_P_I_D_E_R].prg
115 (0801-471B).prg
118 (033C-0354) [a_S_S_E_M_B_L_E].prg
128 (033C-0354) [i_d___w_E_R_K_S].prg
132 (0801-3E4D).prg
135 (033C-0354) [i_c_i].prg
139 (0801-2DB0).prg
143 (033C-0354) [6_4_E_R__t_I_P_S].prg
147 (0801-4911).prg
151 (033C-0354) [p_I_N_G___p_O_N].prg
155 (0801-247F).prg
159 (033C-0354) [r___T_S_E_L_E_C].prg
163 (0801-22A5).prg
167 (033C-0354) [v_O_R_S_C_H_A_U].prg
171 (0801-0E89).prg

The checksum test passed, but the payload for the file with header 118 (“a_S_S_E_M_B_L_E”) is missing. In this case, it’s clear, but if two entries in sequence had been missing, it would have been tricky to detect this.

Comparing Multiple Copies

With just a single copy, we cannot really know whether the data is correct. Checksums are not reliable enough, and it’s hard to detect if a file was just not recognized. If we have two copies of the tape and they read the same, we can be very confident that the dumps are correct. tapclean’s “magic CRC32” that it prints after analyzing a tape is a strong checksum of all recognized data concatenated. So if two dumped copies have the same CRC32, we can assume that the dumps were correct.

It is quite unlikely that two dumps have the same file(s) missing, and it’s extremely unlikely that they had the same bit flips. Well, that is, if we assume that the tapes did not have mastering errors (or weaknesses), but also verifying the checksums and visual inspection of the contents should rule this out.

The following bash script shows the CRC32 values (e.g. “3D3C8936”), the number of correctly checksummed files and the number of total files (e.g. 30-31) for each tape:

for i in `find . -name \*.tap`; do
  echo $i
  (tapclean -t $i > $i.log)&
done
for i in `find . -name \*.log`; do
  issue=$(basename $i | cut -c 1-4)
  numfiles=$(cat $i | grep "^Checksum test" | cut -d "[" -f 2 | cut -d " " -f 1-3 | sed -e "s/ of /-/")
  crc32=$(cat $i | grep "^Magic CRC32" | cut -d ":" -f 2)
  echo $issue $crc32 $numfiles $(echo $i | sed -e "s/.log$//")
done

Here’s example output:

8708 00000000 0-0   ./8708a2.tap
8708 16979EE4 32-32 ./8708b2.tap
8708 16979EE4 32-32 ./8708b.tap
8708 3D3C8936 30-31 ./8708a.tap

8708b.tap and 8708b2.tap (the back sides of two different copies) are correct dumps: They have the same CRC32, they have all correct checksums, and the number of recognized items is even (header, data, header, data, …).

More Copies

Using this strategy, I was able to confirm correct dumps of 18 of the 32 tapes. 14 more to go.

The obvious way is to buy more copies on eBay until I have two that produce the same data. But luckily, there was another source: The C64 TOSEC Collection contains dumps of 20 of the tapes. Their time stamp says 1996, which is when then tapes were 10 years old instead of 30, so they should have been in much better shape. By looking at the checkums and the contents, we can get an idea of whether they are likely correct dumps. This is what the script from before says:

8508 8F9A4357 34-34 ./tosec/8508.tap
8511 F75CC32E 34-34 ./tosec/8511.tap
8605 200E286D 30-30 ./tosec/8605.tap
8510 D0534C58 34-34 ./tosec/8510.tap
8509 74B093F1 32-32 ./tosec/8509.tap
8606 DA4AF054 29-29 ./tosec/8606.tap
8502 8BDAF783 40-40 ./tosec/8502.tap
8512 764F2F59 31-31 ./tosec/8512.tap
8705 A103816A 30-30 ./tosec/8705.tap
8704 E84CC753 30-30 ./tosec/8704.tap
8602 B4F9D173 32-32 ./tosec/8602.tap
8612 547A7371 32-32 ./tosec/8612.tap
8506 78BC0D59 35-35 ./tosec/8506.tap
8603 E70B3E72 30-30 ./tosec/8603.tap
8507 22A807D1 32-32 ./tosec/8507.tap
8608 F9BD6BC9 34-34 ./tosec/8608.tap
8505 9C7FC3F2 35-35 ./tosec/8505.tap
8601 3F749A69 30-30 ./tosec/8601.tap
8611 48C4C35F 28-28 ./tosec/8611.tap
8703 C6757D2F 29-30 ./tosec/8703.tap

At least 8703 has an incorrect checksum. Several tapes have an odd number of recognized items, but sometimes, tapclean doesn’t seem to count correctly, so extracting all items (“-doprg”) and counting them makes sure what the right number is.

If we add these 20 copies to our collection and run the script again, we will be able to verify several more correct dumps.

8502 B76D97FE 35-39 ./1/8502a.tap
8502 8BDAF783 40-40 ./1/8502b.tap
8502 00000000 0-0   ./3/8502a.tap
8502 11D3CFC5 10-20 ./3/8502b.tap
8502 A7A10C6E 0-3   ./4/8502a.tap
8502 FC899939 0-0   ./4/8502b.tap
8502 8BDAF783 40-40 ./tosec/8502.tap

In this example, 1/8502b.tap and tosec/8502.tap contain the same data, so this verifies that these dumps are correct. Overall, this allowed me to verify the correctness of dumps of 8502, 8505, 8506, 8508, 8611, 8612 and 8705. Seven down, seven more to go!

Splice Verify

There are many issues where one dump looks okay, but we do not have an identical one to verify it. But it is not strictly necessary to have another identical dump. If we are certain that no files are missing, and that for every file, there is another dump with the identical file, we can be very certain that the dump is correct. I call this “splice verify”.

Let’s look at 8706:

8706 E6E7FCDA 28-28 ./1/8706a.tap
8706 8F259FA2 9-18  ./1/8706b.tap
8706 3BDC28F3 23-28 ./2/8706a.tap
8706 1B471853 8-17  ./2/8706b.tap

1/8706a.tap is a candidate for a correct dump. We can test whether there are identical copies of each item on the tape on other copies by hashing all items on the candidate and looking for them on the other copies. The following script will extract all tapes into subdirectories of “dir”:

rm -rf dir
mkdir dir
for i in `find . -name \*.tap | cut -c 3-`; do
(
  (
    cd $(dirname $i)
    rm -rf $(basename $i)-$(dirname $i).dir
    mkdir $(basename $i)-$(dirname $i).dir
    cd $(basename $i)-$(dirname $i).dir
    tapclean -t ../$(basename $i) -doprg
    mv prg/* .
    rmdir prg
    cd ..
    mv $(basename $i)-$(dirname $i).dir ../dir/
  ) &
)
done

Then the following script will search for extra copies of every item on the candidate tape:

cd dir
for md5 in $(md5 -q 8706a.tap-1.dir/???\ *); do
  echo
  echo $md5
  md5 -r */* | grep $md5
done

And this script makes sure that every correctly read file on the other tapes also exists on the candidate:

for md5 in $(md5 -r */???\ * | grep -v 8706a.tap-1.dir | grep -v BAD.prg$ | cut -d " " -f 1 | sort | uniq); do
  echo
  echo $md5
  md5 -r 8706a.tap-1.dir/* | grep $md5
done

Using this method, it is possible to verify 8503, 8507, 8510, 8701 and 8706. Only two more tapes need to be verified!

Splicing

We could keep buying more copies of 8504 and 8707 on eBay, but they don’t actually appear all that often. So let’s look at how bad our dumps are. Let’s take 8504:

12C4A3EA 30-30 ./8504a-1.tap
7C6D8541 34-35 ./8504b-1.tap
1337D0C8 20-25 ./8504a-2.tap
868DA7B6 38-38 ./8504b-2.tap
B80FD685 29-30 ./8504a-3.tap
8948C0B7 33-34 ./8504b-3.tap
C909C30F 26-27 ./8504a-4.tap
E3F7B698 8-24  ./8504b-4.tap

None of the dumps seems correct. 8504b-2.tap is the best, but the file count as printed by tapclean is wrong, and there is actually an item missing: The payload of “h_i_r_e_s_s_p_e”.

012 (033C-03FB) [INPUT_64].prg
013 (033C-03FB) [INPUT_64].prg
019 (0300-0303).prg
020 (0300-0303).prg
025 (033C-03FB) [GUTEN_TAG].prg
026 (033C-03FB) [GUTEN_TAG].prg
031 (0801-0C96).prg
032 (0801-0C96).prg
039 (033C-0354) [TEXTE].prg
046 (3000-3B76).prg
053 (033C-0354) [RAHMEN_______COM].prg
059 (C440-CF78).prg
066 (033C-0354) [TITELBILD].prg
073 (0A00-0AFF).prg
080 (033C-0354) [h_i_r_e_s_s_p_e].prg
093 (033C-0354) [k_a_l_e_n_d_e_r].prg
100 (0801-38A2).prg
106 (033C-0354) [r_e_v_e_r_s_i].prg
112 (0801-267D).prg
119 (033C-0354) [s_h_o_r_t_____s].prg
125 (0801-2075).prg
131 (033C-0354) [k_o_n_t_a_k_t_e].prg
137 (0801-57D4).prg
144 (033C-0354) [bits___bytes_im].prg
152 (0801-6A70).prg
159 (033C-0354) [einkommensteuert].prg
166 (0801-28CB).prg
172 (033C-0354) [h_i_l_f_s_p_r_o].prg
179 (0801-1BEC).prg
185 (033C-0354) [n_e_w_s].prg
191 (0801-2873).prg
198 (033C-0354) [6_4_E_R_____t_i].prg
205 (0801-4D89).prg
211 (033C-0354) [a_r_t_e_m___s].prg
217 (0801-2E30).prg
223 (033C-0354) [s_u_p_e_r_t_a_p].prg
230 (0801-1A4A).prg
236 (033C-0354) [l_a_s_t__n_o_t].prg
243 (0801-2106).prg

If it weren’t for the missing data, we could splice-verify this dump. Using the scripts from earlier, we can show that all files can be found on other copies, and all correct files from other copies are on this dump – except for the “h_i_r_e_s_s_p_e” payload.

Then why not splice in the correct object from a correct dump? 8504b-1.tap is one of the dumps that contains a correct copy. tapclean’s tcreport.txt file for the tape with the correct data shows us some information about the “h_i_r_e_s_s_p_e” payload:

---------------------------------
Seq. no.: 113
File Type: SUPERTAPE DATA
Location: $3DFA2 -> $3E13B -> $4E25E -> $4E269
LA: $0801  EA: $3000  SZ: 10239
Pilot/Trailer Size: 62/0
Checkbyte Actual/Expected: $7DA9/$7DA9, PASS
Read Errors: 0
Unoptimized Pulses: 12338
CRC32: CBF1C874

It is located between offsets $3DFA2 and $4E269 inside the file. Here’s the part from the hexdump:

$ hexdump 8504a-1.tap
0003df70  21 43 21 21 21 21 21 21  1e 21 21 21 43 fb 00 6c
0003df80  09 00 00 0b 19 00 00 ff  47 00 00 e5 2b 00 00 d9
0003df90  12 00 00 fe 3d 04 32 32  32 21 1e 21 37 32 2f 21
0003dfa0  21 21 21 43 1e 32 21 21  1e 32 32 2f 21 21 21 21
0003dfb0  40 21 32 1e 21 21 32 32  32 21 1e 21 21 43 21 32

Every byte in a TAP file (after the 20 byte header) represents the length of a pulse in units of about 8 microseconds. Zero is an escape code, after which a three byte little endian value follows which specifies pulse lengths above 255. So the zeroes here just before what tapclean identified as the beginning of the payload are areas of silence on the tape. Grouped correctly, these are the 6 very long pulses, which represent a pause of about half a second:

00 6c 09 00
00 0b 19 00
00 ff 47 00
00 e5 2b 00
00 d9 12 00
00 fe 3d 04

Just after this is where we want to cut.

The cutting position of the end can be found similarly, and by looking at the tcreport.txt of the tape with the broken file, we can find out what part to cut and replace with the good version.

At the end, we need to make sure the number of data bytes in the TAP file’s header at offset 16 (32 bit little endian) is correct.

By doing this, we can create correct versions of the two remaining issues, so now we have a verified correct dump of each of the 32 issues!

The Moral of the Story

Read your tapes early. If you have any tapes, read them now! It doesn’t matter if they have been dumped before: These dumps might be incorrect, because they might not have not been verified with a second copy.