I happened to come across 50 original German GEOS 2.0 disks that were broken and sent in for replacement. In the first part, I covered the disks that were broken probably due to user error. Now let’s look at the read errors on the remaining disks. As it turns out, there might be a bug in GEOS that caused the boot disks to break!
On a 1541 disk, every track consists of sectors that are encoded like this:
The SYNC marks are used to find the start of a header and the start of data. The header contains the number of the sector. Before and after the sector data, there are gaps to account for uncertainty in timing when overwriting a sector.
About 90% of a track is actual sector data. 3% is header data, and 1.5% are SYNC marks. So if a disk degrades through wear and tear, physical damage or demagnetizing, we expect mostly data checksum errors, and a maybe a few header checksum errors and missing SYNC marks. But this is the actual statistic:
|Missing Header SYNC||Missing Data SYNC||Header Checksum Error||Data Checksum Error|
The ratio between missing data sync and the checksum errors is about what we expect – but what about the huge number of missing header SYNCs?
In addition, three of the disks that don’t boot fail because of the very same issue:
Three of the disks fail in DeskTop when trying to read track 1, sector 9 – because of a missing header SYNC.
A missing header can have different causes:
- The SYNCs that mark the header might have degraded. Commodore DOS writes 40 bit SYNC marks, but on GEOS boot disks, they are only 24 bits. But this would apply to header SYNCs and data SYNCs evenly, and we are not seeing any missing data blocks.
- The byte after the SYNC, which distinguishes the header from the sector, might have degraded. This is very specific and quite unlikely.
- The SYNC and maybe more parts of the header may have been overwritten.
Checking several dozen cases showed exactly the same picture:
- The preceding sector was written by the user. (A side effect of the copy protection allows detecting which sectors are unchanged from the original disk contents and which sectors got overwritten.)
- The preceding sector data is intact.
- The preceding sector was written 7-8% more slowly – too much for the gap to cover. Its last bytes overwrote the next sector header, but not the next sector data.
Track 1 Sector 9
The reason why track 1 sector 9 is broken on many disks is simple: The preceding sector on disk is track 1 sector 8, which happens to be the first free block of the GEOS boot disk according to the GEOS filesystem logic. Whenever GEOS searches for a free block for writing on the boot disk, it would pick track 1 sector 8. And this happens a lot: Whenever the user starts a desk accessory, like the alarm clock or preferences, the part of memory required by the desk accessory gets written to disk into a “Swap File” and later loaded back in. So every start of a desk accessory while the boot disk is in the drive will write track 1 sector 8. And sometimes, writing a sector will break the next one.
Track 1 sector 9 is the info sector (the sector that contains the icon and some other metadata) of the joystick driver, which is on the first page of the disk and will be shown by DeskTop after booting – unless the user switched to e.g. using mouse for input, in which case a different input driver would be shown on the first page.
So we have established that sometimes sectors are written too slowly, so they write over the next sector header, making the next sector unreadable. A problem that is not too uncommon with 1541-style disk drives is that the motor speed might be off: Because of the sector gaps, the on-disk format can tolerate a motor that is about 2.5% too slow, and we’re seeing a case of 7-8% here. But let’s first gather more data before jumping to conclusions.
This graph shows that the overwritten headers only ever happen on tracks 1 to 17:
Tracks 1 to 17 have one thing in common: They are speed zone 3. A 1541 disk has three speed zones to account for the different lengths of the tracks:
|Track||# Sectors||Speed Zone||µs/Byte||Raw Kbit/Track|
|1 – 17||21||3||26||60.0|
|18 – 24||19||2||28||55.8|
|25 – 30||18||1||30||52.1|
|31 – 35||17||0||32||48.8|
In speed zone 3, one byte will be written every 26 microseconds. If we write a sector with the speed zone incorrectly set to 2 on a track that should be speed zone 3, one byte will be written every 28 microseconds instead of every 26, which is 7.7% slower. The 330 bytes written with the speed zone 2 setting will cover a length of 355 bytes of the speed zone 3 equivalent – so it will overwrite about 25 bytes at the end. This will overshoot the 8 byte gap and completely destroy the next header.
The fact that the 28/26 ratio matches the measured data perfectly and that the errors only ever happen in speed zone 3 are very strong indications that sometimes, sectors in speed zone 3 get incorrectly written with speed 2.
First, it is important to find out whether this has anything to do with GEOS. Maybe it happens on all 1541 disks, but since no other tasks on a C64 are as disk-bound as running GEOS, the error might not show itself much anywhere else. This is a distribution of missing sector headers errors across about 300 random disks from my collection:
The error is spread evenly, except for the lowest tracks, which are written with a lower density, and track 18, which holds the file directory and therefore usually mostly empty sectors.
So it is not an inherent property of the 1541 hardware or its firmware. It is GEOS-specific.
In order to find out whether it is specific to GEOS boot disks, I would need a large number of GEOS work disks. After all, the boot disks in my collection have been hand-picked to have many errors. I don’t have access to such a collection, but I don’t see why boot disks would be any special, so I am assuming this happens to all disks used with GEOS.
The GEOS 1541 Driver
This makes the GEOS 1541 driver the suspect. The driver takes control of the drive and its firmware and uses its own sector read/write and bus communication code. It does reuse some ROM code though for tasks like GCR encoding/decoding.
This is the reverse-engineered GEOS driver code:
The speed setting is stored in bits 5 and 6 of VIA #2 port B (register at
$1C00). The regular speed setup code is this:
Drv_NewDisk_6: jsr $f24b sta $43 Drv_NewDisk_7: lda $1c00 and #$9f ora DTrackTab,x Drv_NewDisk_8: sta $1c00 rts DTrackTab: .byte $00, $20, $40, $60
The ROM code at
$F24B looks up the number of sectors for the track number in
A, and as a side effect, returns the speed zone (0-3) in
X. The code there is identical on all versions of the 1541, the 1571 and all common clones.
There is one other place that writes to
$1C00, the code to move the head (direction in
X, number of steps in
D_DUNK6: stx $4a asl tay lda $1c00 and #$fe sta $70 lda #$1e sta $71 D_DUNK6_1: lda $70 add $4a eor $70 and #%00000011 eor $70 sta $70 sta $1c00
Bits 0 and 1 of
$1C00 control movement of the head. The remaining bits are saved in
$70. I don’t see how this code would be able to change bits 5 and 6.
This is what we know:
- Most GEOS boot disks fail because a sector of speed zone 3 (tracks 1 to 17) has been written with an incorrect speed setting of 2, overwriting the following sector header and thus making that sector inaccessible.
- This pattern cannot be found on disks used outside of GEOS. It could not be shown that it can be found on users’ data disks used inside GEOS, but I don’t see where boot and data disks would be different in this regard.
- No bug in the GEOS 1541 disk driver could be found.
This is an unresolved mystery. I would be very grateful for:
- collections of data/work disks (
.G64format or physical disks) to confirm this is not specific to boot disks
- a way to reproduce this
- the solution why this is happening!