Inside geoWrite – 6: Localization

In the series about the internals of the geoWrite WYSIWYG text editor for the C64, this article discusses what was required for the German localization.

Article Series

  1. The Overlay System
  2. Screen Recovery
  3. Font Management
  4. Zero Page
  5. Copy Protection
  6. Localization ← this article
  7. File Format and Pagination
  8. Copy & Paste
  9. Keyboard Handling

Overview

Localizing an app doesn’t mean just translating all text. Language is just one part of it. Here are all concepts that require changes to geoWrite:

  • Language
  • Date/time format
  • Number format
  • Character set

Let’s go through them.

Language

Translating the app is not as easy as just translating all strings.

  • Some strings must not be translated.
  • Not all text is part of the UI.
  • Not all text is in string form.

Do Not Translate

First, let’s look at what must not be translated:

fn_textscrap:  
        .byte   "Text  Scrap",0  
fn_photoscrap:  
        .byte   "Photo Scrap",0

These are magic filenames that contain the current clipboard/pasteboard contents. Translating them would break interoperability between apps in different languages.

Keywords

Then, there are strings where it is up to debate whether they should be translated: geoWrite supports three keywords that get replaced with dynamic contents when used in a page header or footer:

  • DATE: inserts the current date
  • TIME: inserts the current time
  • PAGE: inserts the current page number

For the German version, these keywords were in fact translated: DATUM, ZEIT, SEITE.

Strings

From the strings that should be translated, let’s start with the straightforward ones: Here is a table of strings that is used in menus:

Care has to be taken that the translated version fits the available space: The translations of “edit” and “options” were abbreviated (“Edit”/“Editieren”, “Opt”/“Optionen”), because the whole menu bar would have been too wide otherwise:

For submenus, GEOS programmers have to explicitly state the location and size in pixels, so the definition of a submenu has to change as well:


The German version is wider, so the value for the right border of the menu was updated.

Additionally, all menus in the German version are moved up by one pixel. If you look closely, you can see that the English version has a double line between “file” and “close”, while the German version has a single line. The symbol MENU_HEIGHT in the previous code block is 15 for the English version and 14 for the German version. It is unknown what the purpose of this is.

In the case of dialogs, the translated text might not fit into the same number of lines and might require a re-layout:


So while the English version just consists of one line of text, the German version adds a GOTOXY control code to move the cursor to the second line:

Because of word order differences, the startup dialog needed a complete redesign…


…which required changing the locations of all text and icons in the dialog’s definition:

Images

The startup dialog contains buttons that say “Create”, “Open” and “Quit”. GEOS only provides a limited set of predefined buttons (“OK”, “Cancel”, “Open”, …), so the pixel images of “Create” and “Quit” are supplied by the app and need to be translated as well.

The translated words are longer, so the buttons have to be bigger as well.

Screen Recovery Rectangles

As discussed in part 1 of this series, GEOS uses a custom system to save and recover screen contents that get overwritten by menus and dialogs. Since the sizes and positions of the menus are different, the rectangles that need to be recovered are changed in their table as well:

Date/Time Format

Different cultures/languages use different conventions for the date and time format. The DATE and TIME keywords stamp the current date and time into a page’s header or footer. For the English version, it uses the US format for dates and times:

December 31, 1999  11:59 PM

For the German version, it uses the German format, with German month names:

31. Dezember 1999  23:59

This is the core function to create the date string:

        ; date  
        LoadW   r0, dateString  
.if DATE_FORMAT=DATE_FORMAT_US  
        jsr     getMonthName  
        jsr     getDay  
.elseif DATE_FORMAT=DATE_FORMAT_DE  
        jsr     getDay  
        jsr     getMonthName  
.endif  
        jsr     getYear

The month and day are reversed in the two different formats. The “.” vs. the “,” after the day gets handled by the function getDay:

getDay:  
        MoveB   day, r3L  
        LoadB   r3H, 0  
        jsr     byteToDecimal  
        ldy     #0  
.if DATE_FORMAT=DATE_FORMAT_US  
        lda     #','  
.elseif DATE_FORMAT=DATE_FORMAT_DE  
        lda     #'.'  
.endif  
        sta     (r0),y  
        IncW    r0  
        lda     #' '  
        sta     (r0),y  
        IncW    r0  
        rts

The function to create the time string has some extra logic to convert the hour (0-23) to the range (1-12):

        lda     hour  
.if DATE_FORMAT=DATE_FORMAT_US          ; AM/PM  
        cmp     #12  
        bcc     :+                      ; >= 12?  
        sub     #12                     ; then subtract 12  
:       cmp     #0  
        bne     :+                      ; == 0  
        lda     #12                     ; then it's 12  
:
.endif  
        sta     r3L  
        LoadB   r3H, 0

        jsr     byteToDecimal           ; hours  
        ldy     #0  
        lda     #':'  
        sta     (r0),y                  ; ':'  
        IncW    r0

        lda     minutes  
        sta     r3L  
        LoadB   r3H, 0  
        lda     #1  
        jsr     byteToDecimal           ; minutes

And at the end, there is extra code in the US version to add “AM” or “PM”:

        ldy     #0  
        lda     #' '  
        sta     (r0),y                  ; space  
        IncW    r0

.if DATE_FORMAT=DATE_FORMAT_US          ; AM/PM  
        lda     #'A'  
        ldx     hour  
        cpx     #12  
        bcc     :+  
        lda     #'P'  
:       sta     (r0),y  
        IncW    r0  
        lda     #'M'  
        sta     (r0),y  
        IncW    r0  
.endif

Number Format

The character used for the decimal separator may differ between languages – “3.14” in an English text would be “3,14” in a German text. Since geoWrite supports “decimal” tab stops that align numbers around the decimal separator, it needs to scan for this character: The English version checks for “.”, while the German version checks for “,”.

Character Set & Encoding

The German language has four extra letters, the umlauts: “ä”/“Ä”, “ö”/“Ö”, “ü”/“Ü” and “ß”.

GEOS Character Encoding

Until the advent of Unicode, operating systems used different character encodings for different languages or scripts.

The English version of GEOS uses the 7 bit ASCII encoding, which contains the 26 letters A through Z, but no umlauts. The GEOS KERNAL has no context of a character encoding, it just blindly draws glyphs that are stored at an index in a font – as long as the index is between 32 and 127, the 7-bit ASCII printable range. The only difference between the English and the German operating system in terms of character encoding are the fonts: Just like the regular fonts, the fonts that come with the German version have 96 characters, but some characters have been replaced by the extra umlauts and the ‘§’ character (important for legal documents). These are the variants of the system font “BSW/9”:


ASCII German GEOS
@ §
[ Ä
\ Ö
] Ü
{ ä
| ö
} ü
~ ß

geoWrite doesn’t generally have to care about the encoding either: With the German font set, any version of geoWrite will display German umlauts.

There are two cases where it does have to care though: searching and printing.

Searching

The function to search text has the option of searching for whole words only. For this, geoWrite needs to know which code points are letters or numbers. In English, that’s A through Z and 0 through 9. In German, this must include the umlauts. This is the function that decides on what’s an alphanumeric character:

isAlphanumeric:  
        cmp     #'0'  
        bcc     @1  
        cmp     #'9'+1  
        bcc     @yes  
@1:  
.if CHAR_ENCODING=CHAR_ENCODING_ASCII  
        cmp     #'A'  
.elseif CHAR_ENCODING=CHAR_ENCODING_DE  
        cmp     #'@'  
.endif  
        bcc     @2  
.if CHAR_ENCODING=CHAR_ENCODING_ASCII  
        cmp     #'Z'+1  
.elseif CHAR_ENCODING=CHAR_ENCODING_DE  
        cmp     #']'+1  
.endif  
        bcc     @yes  
@2:     cmp     #'a'  
        bcc     @3  
.if CHAR_ENCODING=CHAR_ENCODING_ASCII  
        cmp     #'z'+1  
.elseif CHAR_ENCODING=CHAR_ENCODING_DE  
        cmp     #'~'+1  
.endif  
        bcc     @yes  
@3:     cmp     #'_'  
        beq     @yes  
        clc  
        rts

@yes:   sec  
        rts

With the German encoding, it includes the three characters after the uppercase ‘Z’ and the four characters after the lowercase ‘z’ (see image of font above). There is a bug in this code though: The German version considers “§” (“@” in the code above) an alphanumeric character, which it isn’t.

Printing

The default is for geoWrite to print pixel images of the pages of a document. But there is also ASCII mode, which sends the plain text to the printer, so the printer can use its built-in fonts. In this mode, the English GEOS sends ASCII-encoded text, that is, its internal representation without any conversion, to the printer driver. If the printer uses a different encoding, the driver has to do the conversion.

German GEOS can’t just send the codes for “§ÄÖÜäöüß” – they would print as “@[]{|}~”. It has to convert them, so that printer drivers can be universal and independent of the system’s language.

This is the code that does the conversion – it is missing from the English version:

convertToCp437:  
        ldy     #8  
@loop:  cmp     @from-1,y  
        beq     @found  
        dey  
        bne     @loop  
        rts  
@found: lda     @to-1,y  
        rts

@from:  .byte '@','[','\',']','{','|','}','~'   ; source: GEOS_de  
@to:    .byte $EB,$8E,$99,$9A,$84,$94,$81,$E1   ; target: CP437  
;             'δ','Ä','Ö','Ü','ä','ö','ü','ß'

The eight character codes in German GEOS that differ from ASCII (line @from) are converted to eight codes above 127 (line @to).

The destination encoding is Codepage 437, the standard (and now obsolete) encoding used by the IBM PC and MS-DOS. That is, except for ‘§’, whose CP437 equivalent would be $15, which is a non-printable character in ASCII-based encodings.

The authors of GEOS were free to choose any encoding – it’s really just a convention between applications and the printer drivers. But with CP437, drivers for PC printers of the time can just pipe the data through as is.

Discussion

Modern software usually comes as a single application binary that supports multiple languages, and with support from the operating system can use different conventions for date/time and numbers, and uses Unicode to express and work with any character in any script.

geoWrite is running on a 64 KB system and doesn’t have the luxury of spending code for any of these features – all localization differences are compile-time options. This means that in a multi-lingual environment, there are many limitations:

  • The English version of geoWrite on a German version of GEOS will support umlauts, but can’t correctly search for words with umlauts or print umlauts in ASCII mode. Besides, some buttons in the UI will be in German.
  • The German version of geoWrite on an English version of GEOS will not support umlauts, and the characters “@[]{|}~” won’t print correctly in ASCII mode.
  • Writing an English document in the German version of geoWrite will use the German date/time format, use German month names, and can’t use decimal tab stops with with a ‘.’ as the decimal point.
  • Writing a German document in the English version of geoWrite (on a German GEOS) has the equivalent problems. In addition, searching for words with umlauts won’t work correctly, and neither will printing umlauts.
  • Writing a French or Spanish document with any version of geoWrite works, even with accented letters, as long as only fonts are used where the extra letters are added. But the same limitations with date/time, numbers, searching and printing apply.
  • Good luck with CJK and RTL!
  • Opening a document in a different language version of geoWrite than it was saved in will break either the umlauts or the “@[]{|}~” characters, as well as data, time and page numbers in headers and footers.

Then again, it would be possible to architect a version of geoWrite with more flexibility:

  • The code for date, time, numbers and the encoding only differs minimally between the localizations, so the app could support all variants, based on a system setting.
  • The VLIR architecture of GEOS applications allows dividing code and data into an arbitrary number of records, so every VLIR record of the current geoWrite app could be split into two: one with the code, and one with the strings and UI data structures. Which variant of the UI gets loaded depends on the system language.

The latter point would of course waste space on disk (regular geoWrite is 35 KB, a 1541 disk holds 165 KB) and increase load times of VLIR records.

1 thought on “Inside geoWrite – 6: Localization”

  1. What keyboard layout did the german version of GEOS use?

    Afaik there were no german version of the C64, but the C128 had german umlauts and a german keymap (which would be preferable to the US map even though you can type all characters on both maps). So if GEOS had the US map it would be cumbersome but the keycaps would mostly be correct on a C64, while if it had a german map it would be easier to type on but the keycaps on a C64 would be inccoret, while the keycaps on a C128 would be correct.

    You most likely already know this, but other readers might not know: Using the english printer driver would make it possible to print umlauts using older 7-bit printers (using the apropriate language version of ISO 646. There were a bunch of different versions like german, danish, norwegian, swedish/finnish and so on, and printers sometimes had a bunch of dip switches to select language, or on letter quality printers the physical font (ball on IBM machines and so on) would differ depending on which language the printer supported.

Leave a Reply to MiaM Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.