Reading the Raw Bits of a C64/1541 Disk without a Parallel Cable

An unmodified Commodore 1541 disk drive cannot transfer the raw bits of a whole track to the computer it is attached to: The Commodore Serial Bus is too slow to transmit the data in real time as it arrives from the read head, and the drive only has 2 KB of RAM, which is not enough to buffer the 8 KB of a whole track.

But reading the raw bits as opposed to having the 1541 firmware interpret and decode the on-disk data structures is necessary in some cases: Disks of copy-protected games do not use the standard encoding and are therefore either not readable by the original firmware, or they contain extra metadata that will be checked by the game. When archiving data disks, it is also advisable to read the raw data: If a sector header is missing, for example, reading the disk with the original firmware will just skip the sector, but the sector data would be part of a raw dump, so it may be possible to reconstruct it later.

In order to read the raw data of a full track, it is necessary to use solutions like Kryoflux, a dedicated device to read disks, a Commodore 1571, which can transfer the data in real time because of its higher clock speed and the extra “Fast Serial” hardware, or, in case you want to use a Commodore 1541, either by adding a parallel connection to the computer or by adding more RAM.

This article shows a proof-of-concept implementation of a pure software way of having a Commodore 1541 disk drive transfer a raw track to the computer – with just a few caveats.

The Idea

The basic idea is to have the 1541 repeatedly read as much raw data as fits into RAM and transfer it to the computer, and have to computer splice them together using overlapping areas. But the details are tricky.

The 1541 has 2 KB of RAM, 8 pages, from $0000 to $07FF. If we set aside one page for the code to read the data from disk, we can read in up to 7 pages, or 1.75 KB, at a time. The raw data of one track is just below 8 KB, so by cutting up the track into 5 slices of 1.6 KB each will give us an overlap of 150 bytes. But there are two potential problems with this approach.

Timing

First, if we want to read a particular slice of the track, the only way to do this is by timing. The 1541 does not have an index hole sensor, and there is nothing on disk we can rely on. The disk rotates 5 times per second, with a variation of usually below 0.2%. If it takes one revolution to read a slice and transfer it, we need to be able to measure where on the track we are across 5 revolutions, so we may be off by up to 1%, or 80 bytes. This reduces the overlap between slices to about half.

If we divide up the track into 6 slices, this is less of an issue: The overlap is now 430 bytes, or 350 bytes if we account for the speed variation. And by measuring the motor speed, we might be doing even better than that.

Overlap

But the bigger problem is the size of the overlap. There is no guarantee the data on disk has enough variation for overlap detection to be meaningful. In the worst case, a disk could consist entirely of the byte 0x55, with a single 0xFF at one point, and another 0xFF at the opposite side of the track. Most reads would return just strings of 0x55, so any offset would work for the overlap detector.

So we know that it will be impossible to read pathological-case disks with this method, but how about disks in practice? A standard-formatted 1541 disks consists of sector headers, sector data and the tail gap.

The (small) sector header is unique, because it contains the number of the sector. The sector data is 256 bytes of user-defined data, GCR-converted (5:4) into 320 encoded bytes (plus some overhead). The data within a sector can repeat itself, and two sectors can even contain the same data, so the overlap has to be at least 320 bytes.

Since the lengths of tracks are not evenly divisible by its number of sectors, there is some wasted space between the last sector and the first sector: the tail gap. On a standard-formatted 1541 disk, this area can be up to a bit over 1 KB in size on some tracks. The tail gap usually contains all identical bytes, so the overlap even for standard disks has to be at least 1 KB.

With an overlap of more than 1 KB, it will be necessary to read at least 12 slices, at at every 30 degrees of the disk, taking about 12 revolutions. The motor variation will become more of a problem, so we need more overlap, and even more revolutions. But there should be a sweet spot there at 15-20 slices…

The Solution

But I decided against measuring time and reading the correct parts. I’ll just keep reading random parts until I have enough parts with enough overlapping areas.

My proof-of-concept code is located at $0146 in RAM and only reads 5 pages (1.25 KB, $0300-$07FF) at a time, so it can still coextist with the 1541 operating system in ROM and I don’t have to reinvent the wheel in terms of data transfer.

1.25 KB is enough for 1 KB overlaps, and in practice, the code on the computer side can splice together a track after reading about 25 slices.

Reading Raw Data into Memory

Having the 1541 reading the raw data from the read head into RAM sounds easy: The drive mechanics will set the CPU’s V flag whenever there is a new byte ready, so waiting for a byte is as easy as:

loop:   bvc loop

Then the V flag has to be cleared again, and the data can be read and stored:

lx:     lda $1c01  
        clv  
        sta ($fe),y  
        iny  
        bne loop  
        inc $ff  
        dex  
        bne loop

A byte arrives every ~26-32 clock cycles, while this loop takes at most 21 (255 out of 256 times) or 31 (1 out of 256 times) clocks, which is okay.

But this will not read SYNC marks correctly: A SYNC is 10 or more 1-bits in sequence. It is used to mark headers and sectors, and all other data cannot use 10 or more 1-bits. When a SYNC is detected by the hardware, bit #7 in $1800 is set, and no matter how long the SYNC mark is, there will be no “byte ready” signals until the SYNC has ended and the next data byte has arrived. So the loop above will read SYNC marks as single 0xFF bytes – which is ambigious, since user data can also contain single 0xFF bytes.

One way to special-case SYNC is to check the hardware’s SYNC signal (#7 in $1800) if an 0xFF is encountered, or by measuring the time until the next data byte arrives after an 0xFF. My code does the latter:

        ldx #5      ; number of pages  
        ldy #0

l1:     bvc l1      ; wait for byte  
lx:     lda $1c01   ; read byte  
        clv  
        cmp #$ff    ; special-case $ff  
        beq ff  
ly:     sta ($fe),y ; store  
        iny  
        bne l1      ; page loop  
        inc $ff  
        dex  
        bne l1      ; loop  
end:    cli  
        rts

ff: ; $ff detected  
        iny         ; skip one byte  
        bne la  
        inc $ff  
        dex  
        beq end  
la:  
        bvs ss      ; next  
        bvs ss      ; byte  
        bvs ss      ; arrived  
        bvs ss      ; already?  
        bvs ss      ; then  
        bvs ss      ; it's  
        bvs ss      ; a  
        bvs ss      ; regular $ff  
        bvs ss      ; and  
        bvs ss      ; not  
        bvs ss      ; a  
        bvs ss      ; SYNC  
; SYNC detected  
lo:     bvc lo      ; wait for next data byte  
        lda $1c01   ; read it  
        clv  
        iny         ; skip one byte  
        bne ly  
        inc $ff  
        dex  
        bne ly  
        beq end

ss:     lda $1c01   ; read byte  
        clv  
        jmp ly      ; skip the $ff comparison

To speed things up a litte, 0xFF bytes are not stored into the output stream, instead, the output stream is just advanced. Therefore, the whole buffer must be filled with 0xFF beforehand.

I did not thoroughly measure the code to guarantee that the slowest cases (crossing a page boundary in the SYNC case) can’t miss a byte – because I can rely on the statistical rareness of two reads having the same aligment. The splicing code will just ignore the incorrect read.

Transfering the Data

For the proof-of-concept, I am just scripting the opencbm command line utilities. Uploading the code is done like this:

cbmctrl upload 8 0x0146 read.bin

It is then executed using the “M-E” DOS command:

cbmctrl -p command -e 8 m-e 0x47 0x01

And the 1.25 KB of raw data are received by just downloading the memory contents:

cbmctrl download 8 0x0300 0x0500 /tmp/tmp.bin

Splicing the Data Together

The computer-side implementation of the proof-of-concept is written in Python. It reads a certain number of slices into memory, and then tries to combine them into a track. If it does not succeed, it will read another slice, try again and so on.

The algorithm starts out with an overlap of the slice length minus one, and compares the tail of each slice with the head of each other slice. If they match, they are combined. If no overlap of this length is found, the overlap size is decremented, and it is tried again, down to the minimum overlap.

Once the head of a (combined) slice is identical to its own tail, we have a complete track. We will cut it at the first SYNC mark and remove the duplicate data at the end.

For the best results on custom disks, the minimum overlap should be as large as possible. Assuming the disk format is based on the 1541 standard format, a minimum overlap of about 1100 is reasonable. To speed things up, I am using a minimum overlap of 350: more than an encoded sector, but less than the tail gap. Then I am special-casing tail gap detection by excluding matches of all-equal bytes.

When the script terminates, it will write the track into the file “result.bin”. If you want to analyze it, you can paste the hex byte after the bytes keyword in the following .TXT file:

no-tracks 84  
track-size 7928  
track 1  
   speed 3  
   begin-at 0  
   bytes [...]  
end-track

and convert it to a .G64 file using g64conv. Then, convert it back to .TXT with the same tool to see the decoded data.

The Code

import sys  
import os  
import copy

def getslice():  
    os.system("cbmctrl -p command -e 8 m-e 0x47 0x01")  
    os.system("cbmctrl download 8 0x0300 0x0500 /tmp/tmp.bin")  
    return bytearray(open("/tmp/tmp.bin", 'rb').read())

track = 35

chunk_len = 5*256  
min_overlap = 350 # > GCR sector + n  
initial_reads = 20

drive_code = bytearray(chr(track) +  
'\xad\x46\x01\x85\x0e\xa9\x00\x85'  
'\x0f\xa9\xb0\x85\x04\xa5\x04\x30'  
'\xfc\x78\xad\x00\x1c\x09\x0c\x8d'  
'\x00\x1c\xa9\xee\x8d\x0c\x1c\xa2'  
'\x00\xa9\xff\x9d\x00\x03\x9d\x00'  
'\x04\x9d\x00\x05\x9d\x00\x06\x9d'  
'\x00\x07\xe8\xd0\xee\xa9\x00\x85'  
'\xfe\xa9\x03\x85\xff\xb8\x50\xfe'  
'\xb8\xad\x01\x1c\xa2\x05\xa0\x00'  
'\x50\xfe\xad\x01\x1c\xb8\xc9\xff'  
'\xf0\x0c\x91\xfe\xc8\xd0\xf1\xe6'  
'\xff\xca\xd0\xec\x58\x60\xc8\xd0'  
'\x05\xe6\xff\xca\xf0\xf6\x70\x26'  
'\x70\x24\x70\x22\x70\x20\x70\x1e'  
'\x70\x1c\x70\x1a\x70\x18\x70\x16'  
'\x70\x14\x70\x12\x70\x10\x50\xfe'  
'\xad\x01\x1c\xb8\xc8\xd0\xcb\xe6'  
'\xff\xca\xd0\xc6\xf0\xce\xad\x01'  
'\x1c\xb8\x4c\x99\x01')

open("/tmp/read.bin", "wb").write(drive_code)

os.system("cbmctrl reset")  
os.system("cbmctrl command 8 I")  
os.system("cbmctrl upload 8 0x0146 /tmp/read.bin")

slices = []

for i in range(0, initial_reads):  
    print("\b{}/{}".format(i + 1, initial_reads))  
    slices.append(getslice())

while True:  
    print("\bnumber of slices: {}".format(len(slices)))

    max_overlap = chunk_len - 1

    data = copy.deepcopy(slices)

    while max_overlap > min_overlap:  
        for overlap in range(max_overlap, min_overlap, -1):  
            for i in range(0, len(data)):  
                if len(data[i]) == 0:  
                    continue  
                test = data[i][-overlap:]  
                if test == [test[0]] * len(test):  
                    print("skipping constant area for matching.")  
                    continue  
                found = False  
                for j in range(0, len(data)):  
                    if i != j and data[j].startswith(test):  
                        print("i: {}, j: {}, overlap: {}, leni: {}, lenj: {}, lennew: {}".format(i, j, overlap, len(data[i]), len(data[j]), len(data[i] + data[j][overlap:])))  
                        data[i] = data[i] + data[j][overlap:]  
                        data[j] = bytearray()

                        candidate = data[i]

                        pos = candidate.find(bytearray('\xff\xff'))  
                        candidate = candidate[pos:]

                        if len(candidate) >= 6000:  
                            best_pos = -1  
                            for k in range(min_overlap, len(candidate) / 2):  
                                pos = candidate[1:].find(candidate[:k])  
                                if pos >= 0:  
                                    best_pos = pos  
                                else:  
                                    break  
                            if best_pos >= 0:  
                                result = candidate[:best_pos]  
                                open("result.bin", "wb").write(result)  
                                print("track complete!")  
                                sys.exit()

                        max_overlap = overlap  
                        found = True  
                        break  
        if not found:  
            max_overlap -= 1

    numchunks = 0  
    for slice in slices:  
        if len(slice) > 0:  
            numchunks += 1  
    print("number of chunks: {}".format(numchunks))  
    print("reading another slice...")  
    slices.append(getslice())

Limitations

It will always be possible to construct a 1541 disk that is impossible to read with this method, but most disks – especially all data disks – can be read using this method. Still, here are its limitations:

  • Areas longer than 1279 bytes that don’t contain enough data for proper overlap detection (in practice, these are only custom areas on copy-protected disks) can’t be read correctly. This number could be increased to up to about 1.75 KB with custom 1541 code that opimizes memory usage.
  • And with the “min_overlap = 350” optimization in place: Areas larger than 350 bytes that don’t contain enough data for proper overlap detection (i.e. tail gaps and copy protection) and aren’t just the same byte repeated all over, can’t be read correctly.

In addition, there are some missing features that could be added:

  • The code does not detect killer tracks or empty tracks.
  • The current implementation stores all SYNC marks as two 0xFF bytes, because it doesn’t measure the length of SYNC marks.

And of course, reading a whole track is very slow because the original Commdore Serial Bus protocol (~400 bytes/sec) is used to transfer the data.

Next Steps

The solution could be integrated into nibtools, so the standard tool set meant for a 1541 with a parallel cable as well as the 1571 also works on a stock 1541. Changing to a fast serial data transfer routine and optimizing the memory layout should allow reading a complete track in about 20 revolutions, which is about 2 minutes per disk.

1 thought on “Reading the Raw Bits of a C64/1541 Disk without a Parallel Cable”

  1. Perhaps a few passes that record only every nth byte could help align otherwise difficult data.

Leave a Comment