Inside geoWrite – 8: Copy & Paste

In the series about the internals of the geoWrite WYSIWYG text editor for the C64, this article discusses its efficient cross-application cut/copy/paste implementation.

Article Series

  1. The Overlay System
  2. Screen Recovery
  3. Font Management
  4. Zero Page
  5. Copy Protection
  6. Localization
  7. File Format and Pagination
  8. Copy & Paste ← this article
  9. Keyboard Handling

GEOS Scrap Architecture

Just like modern operating systems, GEOS supports cut, copy and paste, within one app as well as between multiple apps. But this is about where the similarities end.

First of all, there is no single clipboard/pasteboard: There is one for every type of data, and they are called “scraps”. There can be a “text scrap” and a “photo scrap” (image data) at the same time, for example. When the user selects “paste” in the “edit” menu, geoWrite asks what type should be pasted:

The GEOS KERNAL has no concept of scraps – they are purely a convention between apps. Text and photo scraps are specified by the GEOS reference manual and apps like geoWrite, geoPaint and geoPublish implement this specification.

Since scraps can be rather large, they are stored as files on disk. Copying text between two apps basically means that one app writes a file with a defined file name in a standardized format, and the other app reads it. As a side effect, scraps are persistent even across reboots of the operating system. Here is a text scrap on disk:

Text and photo scraps are the two types specified by GEOS, but since there is nothing special about scrap files, any application can use its own scrap format: geoCalc uses calc scraps, for example, which allows spreadsheet cells to be copied between documents.

By convention, scraps are sequential (non-VLIR) files with a name of “* Scrap” and a type of “* Scrap Vn.n”. “*” represents the type of scrap, space-padded to 5 characters:

App Filename Type String
geoWrite 2.1 Text  Scrap Text  Scrap V2.0
geoPaint 2.0 Photo Scrap Photo Scrap V1.1
geoCalc 1.0 Calc  Scrap Calc  Scrap V1.1

As you can see from the type string, scraps are versioned, just like apps and documents. Apps should contain conversion code to accept older versions, and refuse to load versions that are newer than supported.

GEOS comes with the Text Manager and Photo Manager desk accessories, which are simple databases (called “albums”) for text and photo scraps, respectively. The following screenshot is Text Manager showing a plain-text preview of one text scrap in its album:

Text Scraps

Format

A text scrap is a sequential file on disk with a name of “Text  Scrap” and a type of “Text  Scrap V2.0”. It supports all geoWrite features (fonts, styles, rulers) except embedded images.

The first two bytes of the file are the size of the data that follows, so the data can be up to 65535 bytes. geoWrite can paste any size, but cannot create scraps larger than a page.

The remainder is regular geoWrite page data. It must start with a NewCardSet escape sequence to define the font and style of the text that follows. Here is an example:

00000000  10 00 17 8c 00 40 48 65  6c 6c 6f 20 57 6f 72 6c  |.....@Hello Worl|  
00000010  64 21                                             |d!|
  • 10 00 ($0010) is the scrap’s length, 16 bytes.
  • 17 is the ESC_NEWCARDSET escape code.
  • 8c 00 ($008C) specifies the California font at 12 pt.
  • 40 is the style byte, and means bold face.
  • The remainder is the ASCII text “Hello World!”

The text can contain the following control codes:

Code Description
$09 Tab
$0C Page Break
$0D Line Break
$11 Ruler Escape
$17 NewCardSet Escape

See part 7 for more information on the geoWrite escape sequences. For the following sections, the background discussed in that article may be generally useful.

geoWrite Implementation

By convention, the text scrap needs to be a file on disk. But disk is slow, so geoWrite uses a 300 byte cache in memory. When copying and pasting within the app, and if the text is small enough to fit in the cache, the scrap is not written to disk until necessary.

Copy

If the user selects some text and then clicks on “copy” in the “edit” menu, the current selection gets copied into a text scrap.

First, geoWrite looks through the selected range of geoWrite data to see whether it contains an embedded image (ESC_GRAPHICS). If this is the case, it shows an error, since text scraps cannot contain images.

Then it trims the selection so that it doesn’t contain unnecessary data: If the very end of the selection is a ruler data structure, it gets removed, since all its properties (margins, tab stops, alignment and spacing) only apply to the paragraph following it, which is not part of the selection.

The same would be true for a NewCardSet (font and style) structure at the end of the selection, but the text selection logic has already stripped it from the end.

All text scraps have to start with the NewCardSet structure to set font and style. geoWrite does not have the space to work on a copy of the selected text and needs to create the scrap in-place: Therefore it saves the four bytes preceding the text range and overwrites them with the NewCardSet structure. It will restore these bytes after saving the text scrap.

If the resulting scrap data is small enough, it will be copied into the buffer in memory. Otherwise, it will be saved to disk. If there is already a text scrap on disk, it will be overwritten.

Paste

The “paste/text” function in the “edit” menu inserts the text scrap at the cursor position. If there is currently text selected, it gets deleted before inserting the scrap, effectively replacing the selection with the scrap.

To insert text into a page, the text between the cursor position and the end of the buffer is moved up in memory to make space for the data to be inserted. If there is a text scrap in the memory buffer, geoWrite just copies the scrap data verbatim into the page.

Disk

If the text scrap is on disk, it is more complicated. geoWrite cannot just make space in the buffer and read the scrap into it: It might not fit. The app’s memory manager generally keeps about one page in the buffer, and pages data from and to disk to work with bigger amounts of data.

geoWrite therefore reads and inserts the text scrap block by block, every time moving the page data up by a block. If this would overflow the buffer, the memory manager will do a repagination run to move some data to disk and therefore reduce the size of the data in the buffer.

But there is now another problem: Inserting the text scrap is no longer an atomic operation. geoWrite must be able to abort the insertion process after any block and still have a consistent document at the end. There are several reasons the insertion might be aborted:

  • There is an I/O error when reading the scrap.
  • The document exceeds 61 pages.
  • The disk is too low on space to continue.

Ruler (27 bytes) and NewCardSet (4 bytes) escape sequences may cross a block boundary, so if insertion is aborted, it could happen that an incomplete sequence is added to the document, which would leave the document in an illegal state.

Therefore, geoWrite interprets the scrap data and stops before escape sequences that span two blocks. It then only inserts the data before this incomplete escape sequence. Then, when the next block is loaded, it inserts the whole escape sequence in one go.

With this strategy, if an insertion has to be aborted, the text up to the error will be cleanly added to the document.

Style

All text scraps define the font and style at the beginning, and they can contain a ruler definition. By just concatenating the scrap with the document’s existing text after the insertion point, these font/style/ruler changes would continue to apply to existing text after the inserted scrap. Therefore, geoWrites inserts after the scrap a copy of the ruler or cardset that was active before the insertion point, in order to keep the style of the original text the same.

There is no need for this though:

  • if there is already a ruler/NewCardSet escape at this position.
  • if the paste happens at the end of the document.
  • if the paste happens at the end of a page. The next page always starts with explicit paragraph and font/style escapes.

Lazy Logic

Copying and pasting text that is less than 300 bytes within geoWrite happens without disk access, but for interoperability with other apps, the buffer in memory will have to be written to disk in two cases:

  • If geoWrite runs a desk accessory. Desk accessories are like apps, but they are smaller, launched from a regular app and they return to the app when they quit. Examples would be the calculator, note pad and the text and photo managers. These desk accessories may want to access the text scrap.
  • It geoWrite quits. If geoWrite is launched again, or any other app is run, this app may want to read the text scrap.

Photo Scraps

Format

A photo scrap is a sequential file on disk with a name of “Photo Scrap” and a type of “Photo Scrap V1.1”. It is generally a rectangular monochrome image.

The first three bytes of the file are the dimensions of the image. The width is one byte and is measured in units of 8 pixels. The next two bytes are the height in pixels. The theoretical maximum dimensions of a photo scrap are therefore 2040 (255*8) * 65535, with the width divisible by 8.

The bitmap data uses 8 horizontal pixels per byte (the leftmost pixel is bit 7, 1-bits are black). Consecutive bytes describe a pixel line from left to right. These lines are stored row by row starting from the top.

This bitmap data is stored compressed using a format based on runlength-encoding (RLE). GEOS KERNAL’s “BitmapUp” can decode this format, so applications can just pass the data to the KERNAL’s APIs without having to care about the specific encoding.

BitmapUp-compressed data is a sequence of packets. Every packet starts with a “count” byte:

count Value Description
$00 Reserved
$01-$7F Repeat: Repeat the following byte count times
$80 Reserved
$81-$DB Unique: Use the next count – $80 bytes literally
$DC Reserved
$DD-$FF Bigcount: Followed by a bigcount byte. Repeat the following count – $DC bytes bigcount times, interpreting the resulting bytes using the Repeat and Unique rules.

As an example, let’s look at a 16×16 rectangle:

****************  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
*              *  
****************

It can be compressed into 9 bytes:

.byte   2,%11111111  
.byte   $DC+3,14  
    .byte   $80+2,%10000000,%00000001  
.byte   2,%11111111
  • The first line instructs the decoder to repeat the bit pattern %11111111 2 times, producing 16 black pixels – the top line of the rectangle.
  • The second line will cause the next 3 bytes to be repeated 14 times, once for every line of the rectangle except the first and the last one.
  • These next three bytes are again encoded and tell the decoder to take the next 2 bytes verbatim: %10000000 and %0000001 describe one line of the rectangle with the leftmost and the rightmost pixel set.
  • The last line is the same as the first one; it creates the bottom row.

After the monochrome bitmap data, an image scrap can optionally contain data on how to colorize 8×8 pixel squares. geoWrite does not support color and just ignores this part.

geoWrite Implementation

geoWrite can paste photo scraps with heights up to 144 pixels. Images wider than the page’s dimensions will effectively be cropped when rendering.

In geoWrite documents, image data is not stored inline with the text of the document. Instead, the text contains a 5 byte graphics escape sequence pointing to the image:

Offset Type Contents Description
0 Byte Escape Code Constant $10 (ESC_GRAPHICS)
1 Byte Image Width Width of image divided by 8
2-3 Word Image Height Height of image
4 Byte Record Number Number of record containing image data

The data of each image is stored in a separate VLIR record. The format of image records is that of photo scraps, so when pasting an image, all geoWrite has to do is make a copy of the photo scrap file into a new VLIR record in the document.

Before this can be done, a few checks have to be made though:

  • A photo scrap file must exist.
  • The file version must not be too new. (There is a bug in geoWrite 2.1: It checks for a maximum version of V2.1, but the highest version of the photo scrap file format at the time geoWrite was written was V1.1 – and still is today. The version checking code was meant for geoWrite documents and text scraps, whose versioning follows the geoWrite version. It should have been special cased for photo scraps.)
  • The height must not be more than 144 pixels.
  • There must be space in the document. A geoWrite document can hold up to 64 images.

Nomenclature

One last thought about GEOS and naming things. What I called “image” throughout this article, GEOS calls:

  • graphics in the KERNAL API.
  • picture in the geoWrite UI.
  • photo in all references to photo scraps.

References

2 thoughts on “Inside geoWrite – 8: Copy & Paste”

  1. Me too – so impressive to read about all of this. I still sometimes have to think if I’m dreaming of if all of this was really possible in 64K. In these days, 64K is bare enough for a header structure in a file, or a programmer will without hesistance use that (or much higher!) amount of memory for even the most simplest and mundane tasks.

Leave a Reply to Anonymous Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.