{"id":1460,"date":"2020-09-09T00:18:51","date_gmt":"2020-09-08T22:18:51","guid":{"rendered":"https:\/\/www.pagetable.com\/?p=1460"},"modified":"2020-09-09T00:18:51","modified_gmt":"2020-09-08T22:18:51","slug":"inside-geowrite-6-localization","status":"publish","type":"post","link":"https:\/\/www.pagetable.com\/?p=1460","title":{"rendered":"Inside geoWrite \u2013 6: Localization"},"content":{"rendered":"<p>In the series about the internals of the geoWrite WYSIWYG text editor for the C64, this article discusses what was required for the German localization.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_localizaion.gif\" height=\"400\" width=\"640\" alt=\"\" \/><\/p>\n<h2 id=\"article-series\">Article Series<\/h2>\n<ol>\n<li><a href=\"https:\/\/www.pagetable.com\/?p=1425\">The Overlay System<\/a><\/li>\n<li><a href=\"https:\/\/www.pagetable.com\/?p=1428\">Screen Recovery<\/a><\/li>\n<li><a href=\"https:\/\/www.pagetable.com\/?p=1436\">Font Management<\/a><\/li>\n<li><a href=\"https:\/\/www.pagetable.com\/?p=1442\">Zero Page<\/a><\/li>\n<li><a href=\"https:\/\/www.pagetable.com\/?p=1449\">Copy Protection<\/a><\/li>\n<li><strong>Localization<\/strong> \u2190 this article<\/li>\n<li><a href=\"https:\/\/www.pagetable.com\/?p=1471\">File Format and Pagination<\/a><\/li>\n<li><a href=\"https:\/\/www.pagetable.com\/?p=1481\">Copy &amp; Paste<\/a><\/li>\n<li><a href=\"https:\/\/www.pagetable.com\/?p=1490\">Keyboard Handling<\/a><\/li>\n<\/ol>\n<h2 id=\"overview\">Overview<\/h2>\n<p>Localizing an app doesn&rsquo;t mean just translating all text. Language is just one part of it. Here are all concepts that require changes to geoWrite:<\/p>\n<ul>\n<li>Language<\/li>\n<li>Date\/time format<\/li>\n<li>Number format<\/li>\n<li>Character set<\/li>\n<\/ul>\n<p>Let&rsquo;s go through them.<\/p>\n<h2 id=\"language\">Language<\/h2>\n<p>Translating the app is not as easy as just translating all strings.<\/p>\n<ul>\n<li>Some strings must not be translated.<\/li>\n<li>Not all text is part of the UI.<\/li>\n<li>Not all text is in string form.<\/li>\n<\/ul>\n<h3 id=\"do-not-translate\">Do Not Translate<\/h3>\n<p>First, let&rsquo;s look at what <strong>must not<\/strong> be translated:<\/p>\n<pre><code>fn_textscrap:\n        .byte   \"Text  Scrap\",0\nfn_photoscrap:\n        .byte   \"Photo Scrap\",0\n<\/code><\/pre>\n<p>These are magic filenames that contain the current clipboard\/pasteboard contents. Translating them would break interoperability between apps in different languages.<\/p>\n<h3 id=\"keywords\">Keywords<\/h3>\n<p>Then, there are strings where it is up to debate whether they should be translated: geoWrite supports three keywords that get replaced with dynamic contents when used in a page header or footer:<\/p>\n<ul>\n<li><code>DATE<\/code>: inserts the current date<\/li>\n<li><code>TIME<\/code>: inserts the current time<\/li>\n<li><code>PAGE<\/code>: inserts the current page number<\/li>\n<\/ul>\n<p>For the German version, these keywords were in fact translated: <code>DATUM<\/code>, <code>ZEIT<\/code>, <code>SEITE<\/code>.<\/p>\n<h3 id=\"strings\">Strings<\/h3>\n<p>From the strings that should be translated, let&rsquo;s start with the straightforward ones: Here is a table of strings that is used in menus:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_diff0.png\" height=\"189\" width=\"671\" alt=\"\" \/><\/p>\n<p>Care has to be taken that the translated version fits the available space: The translations of &ldquo;edit&rdquo; and &ldquo;options&rdquo; were abbreviated (&ldquo;Edit&rdquo;\/&ldquo;Editieren&rdquo;, &ldquo;Opt&rdquo;\/&ldquo;Optionen&rdquo;), because the whole menu bar would have been too wide otherwise:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_menubar.png\" height=\"60\" width=\"630\" alt=\"\" \/><\/p>\n<p>For submenus, GEOS programmers have to explicitly state the location and size in pixels, so the definition of a submenu has to change as well:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_menu_en.png\" height=\"400\" width=\"640\" alt=\"\" \/><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_menu_de.png\" height=\"400\" width=\"640\" alt=\"\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_diff0a.png\" height=\"190\" width=\"697\" alt=\"\" \/><\/p>\n<p>The German version is wider, so the value for the right border of the menu was updated.<\/p>\n<p>Additionally, all menus in the German version are moved up by one pixel. If you look closely, you can see that the English version has a double line between &ldquo;file&rdquo; and &ldquo;close&rdquo;, while the German version has a single line. The symbol <code>MENU_HEIGHT<\/code> in the previous code block is 15 for the English version and 14 for the German version. It is unknown what the purpose of this is.<\/p>\n<p>In the case of dialogs, the translated text might not fit into the same number of lines and might require a re-layout:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_dialog1_en.png\" height=\"400\" width=\"640\" alt=\"\" \/><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_dialog1_de.png\" height=\"400\" width=\"640\" alt=\"\" \/><\/p>\n<p>So while the English version just consists of one line of text, the German version adds a <code>GOTOXY<\/code> control code to move the cursor to the second line:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_diff1.png\" height=\"111\" width=\"669\" alt=\"\" \/><\/p>\n<p>Because of word order differences, the startup dialog needed a complete redesign&hellip;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_dialog2_en.png\" height=\"400\" width=\"640\" alt=\"\" \/><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_dialog2_de.png\" height=\"400\" width=\"640\" alt=\"\" \/><\/p>\n<p>&hellip;which required changing the locations of all text and icons in the dialog&rsquo;s definition:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_diff2.png\" height=\"399\" width=\"668\" alt=\"\" \/><\/p>\n<h3 id=\"images\">Images<\/h3>\n<p>The startup dialog contains buttons that say &ldquo;Create&rdquo;, &ldquo;Open&rdquo; and &ldquo;Quit&rdquo;. GEOS only provides a limited set of predefined buttons (&ldquo;OK&rdquo;, &ldquo;Cancel&rdquo;, &ldquo;Open&rdquo;, &hellip;), so the pixel images of &ldquo;Create&rdquo; and &ldquo;Quit&rdquo; are supplied by the app and need to be translated as well.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_buttons.png\" height=\"66\" width=\"226\" alt=\"\" \/><\/p>\n<p>The translated words are longer, so the buttons have to be bigger as well.<\/p>\n<h3 id=\"screen-recovery-rectangles\">Screen Recovery Rectangles<\/h3>\n<p>As discussed in <a href=\"https:\/\/www.pagetable.com\/?p=1428\">part 1<\/a> of this series, GEOS uses a custom system to save and recover screen contents that get overwritten by menus and dialogs. Since the sizes and positions of the menus are different, the rectangles that need to be recovered are changed in their table as well:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/geowrite_6_diff3.png\" height=\"514\" width=\"653\" alt=\"\" \/><\/p>\n<h2 id=\"date\/time-format\">Date\/Time Format<\/h2>\n<p>Different cultures\/languages use different conventions for the date and time format. The <code>DATE<\/code> and <code>TIME<\/code> keywords stamp the current date and time into a page&rsquo;s header or footer. For the English version, it uses the US format for dates and times:<\/p>\n<pre><code>December 31, 1999  11:59 PM\n<\/code><\/pre>\n<p>For the German version, it uses the German format, with German month names:<\/p>\n<pre><code>31. Dezember 1999  23:59\n<\/code><\/pre>\n<p>This is the core function to create the date string:<\/p>\n<pre><code>        ; date\n        LoadW   r0, dateString\n.if DATE_FORMAT=DATE_FORMAT_US\n        jsr     getMonthName\n        jsr     getDay\n.elseif DATE_FORMAT=DATE_FORMAT_DE\n        jsr     getDay\n        jsr     getMonthName\n.endif\n        jsr     getYear\n<\/code><\/pre>\n<p>The month and day are reversed in the two different formats. The &ldquo;.&rdquo; vs. the &ldquo;,&rdquo; after the day gets handled by the function <code>getDay<\/code>:<\/p>\n<pre><code>getDay:\n        MoveB   day, r3L\n        LoadB   r3H, 0\n        jsr     byteToDecimal\n        ldy     #0\n.if DATE_FORMAT=DATE_FORMAT_US\n        lda     #','\n.elseif DATE_FORMAT=DATE_FORMAT_DE\n        lda     #'.'\n.endif\n        sta     (r0),y\n        IncW    r0\n        lda     #' '\n        sta     (r0),y\n        IncW    r0\n        rts\n<\/code><\/pre>\n<p>The function to create the time string has some extra logic to convert the hour (0-23) to the range (1-12):<\/p>\n<pre><code>        lda     hour\n.if DATE_FORMAT=DATE_FORMAT_US          ; AM\/PM\n        cmp     #12\n        bcc     :+                      ; &gt;= 12?\n        sub     #12                     ; then subtract 12\n:       cmp     #0\n        bne     :+                      ; == 0\n        lda     #12                     ; then it's 12\n:\n.endif\n        sta     r3L\n        LoadB   r3H, 0\n\n        jsr     byteToDecimal           ; hours\n        ldy     #0\n        lda     #':'\n        sta     (r0),y                  ; ':'\n        IncW    r0\n\n        lda     minutes\n        sta     r3L\n        LoadB   r3H, 0\n        lda     #1\n        jsr     byteToDecimal           ; minutes\n<\/code><\/pre>\n<p>And at the end, there is extra code in the US version to add &ldquo;AM&rdquo; or &ldquo;PM&rdquo;:<\/p>\n<pre><code>        ldy     #0\n        lda     #' '\n        sta     (r0),y                  ; space\n        IncW    r0\n\n.if DATE_FORMAT=DATE_FORMAT_US          ; AM\/PM\n        lda     #'A'\n        ldx     hour\n        cpx     #12\n        bcc     :+\n        lda     #'P'\n:       sta     (r0),y\n        IncW    r0\n        lda     #'M'\n        sta     (r0),y\n        IncW    r0\n.endif\n<\/code><\/pre>\n<h2 id=\"number-format\">Number Format<\/h2>\n<p>The character used for the decimal separator may differ between languages \u2013 &ldquo;3.14&rdquo; in an English text would be &ldquo;3,14&rdquo; in a German text. Since geoWrite supports &ldquo;decimal&rdquo; tab stops that align numbers around the decimal separator, it needs to scan for this character: The English version checks for &ldquo;.&rdquo;, while the German version checks for &ldquo;,&rdquo;.<\/p>\n<h2 id=\"character-set-&amp;-encoding\">Character Set &amp; Encoding<\/h2>\n<p>The German language has four extra letters, the umlauts: &ldquo;\u00e4&rdquo;\/&ldquo;\u00c4&rdquo;, &ldquo;\u00f6&rdquo;\/&ldquo;\u00d6&rdquo;, &ldquo;\u00fc&rdquo;\/&ldquo;\u00dc&rdquo; and &ldquo;\u00df&rdquo;.<\/p>\n<h3 id=\"geos-character-encoding\">GEOS Character Encoding<\/h3>\n<p>Until the advent of Unicode, operating systems used different character encodings for different languages or scripts.<\/p>\n<p>The English version of GEOS uses the 7 bit ASCII encoding, which contains the 26 letters A through Z, but no umlauts. The GEOS KERNAL has no context of a character encoding, it just blindly draws glyphs that are stored at an index in a font \u2013 as long as the index is between 32 and 127, the 7-bit ASCII printable range. The only difference between the English and the German operating system in terms of character encoding are the fonts: Just like the regular fonts, the fonts that come with the German version have 96 characters, but some characters have been replaced by the extra umlauts and the &lsquo;\u00a7&rsquo; character (important for legal documents). These are the variants of the system font &ldquo;BSW\/9&rdquo;:<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/bsw9.png\" height=\"20\" width=\"954\" alt=\"\" \/><br \/>\n<img loading=\"lazy\" decoding=\"async\" src=\"docs\/geowrite\/bsw9_de.png\" height=\"20\" width=\"960\" alt=\"\" \/><\/p>\n<table>\n<thead>\n<tr>\n<th> ASCII <\/th>\n<th> German GEOS <\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td> @     <\/td>\n<td> \u00a7           <\/td>\n<\/tr>\n<tr>\n<td> [     <\/td>\n<td> \u00c4           <\/td>\n<\/tr>\n<tr>\n<td> \\     <\/td>\n<td> \u00d6           <\/td>\n<\/tr>\n<tr>\n<td> ]     <\/td>\n<td> \u00dc           <\/td>\n<\/tr>\n<tr>\n<td> {     <\/td>\n<td> \u00e4           <\/td>\n<\/tr>\n<tr>\n<td> |    <\/td>\n<td> \u00f6           <\/td>\n<\/tr>\n<tr>\n<td> }     <\/td>\n<td> \u00fc           <\/td>\n<\/tr>\n<tr>\n<td> ~     <\/td>\n<td> \u00df           <\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>geoWrite doesn&rsquo;t generally have to care about the encoding either: With the German font set, any version of geoWrite will display German umlauts.<\/p>\n<p>There are two cases where it does have to care though: searching and printing.<\/p>\n<h3 id=\"searching\">Searching<\/h3>\n<p>The function to search text has the option of searching for whole words only. For this, geoWrite needs to know which code points are letters or numbers. In English, that&rsquo;s A through Z and 0 through 9. In German, this must include the umlauts. This is the function that decides on what&rsquo;s an alphanumeric character:<\/p>\n<pre><code>isAlphanumeric:\n        cmp     #'0'\n        bcc     @1\n        cmp     #'9'+1\n        bcc     @yes\n@1:\n.if CHAR_ENCODING=CHAR_ENCODING_ASCII\n        cmp     #'A'\n.elseif CHAR_ENCODING=CHAR_ENCODING_DE\n        cmp     #'@'\n.endif\n        bcc     @2\n.if CHAR_ENCODING=CHAR_ENCODING_ASCII\n        cmp     #'Z'+1\n.elseif CHAR_ENCODING=CHAR_ENCODING_DE\n        cmp     #']'+1\n.endif\n        bcc     @yes\n@2:     cmp     #'a'\n        bcc     @3\n.if CHAR_ENCODING=CHAR_ENCODING_ASCII\n        cmp     #'z'+1\n.elseif CHAR_ENCODING=CHAR_ENCODING_DE\n        cmp     #'~'+1\n.endif\n        bcc     @yes\n@3:     cmp     #'_'\n        beq     @yes\n        clc\n        rts\n\n@yes:   sec\n        rts\n<\/code><\/pre>\n<p>With the German encoding, it includes the three characters after the uppercase &lsquo;Z&rsquo; and the four characters after the lowercase &lsquo;z&rsquo; (see image of font above). There is a bug in this code though: The German version considers &ldquo;\u00a7&rdquo; (&ldquo;@&rdquo; in the code above) an alphanumeric character, which it isn&rsquo;t.<\/p>\n<h3 id=\"printing\">Printing<\/h3>\n<p>The default is for geoWrite to print pixel images of the pages of a document. But there is also ASCII mode, which sends the plain text to the printer, so the printer can use its built-in fonts. In this mode, the English GEOS sends ASCII-encoded text, that is, its internal representation without any conversion, to the printer driver. If the printer uses a different encoding, the driver has to do the conversion.<\/p>\n<p>German GEOS can&rsquo;t just send the codes for &ldquo;\u00a7\u00c4\u00d6\u00dc\u00e4\u00f6\u00fc\u00df&rdquo; \u2013 they would print as &ldquo;@[]{|}~&rdquo;. It has to convert them, so that printer drivers can be universal and independent of the system&rsquo;s language.<\/p>\n<p>This is the code that does the conversion \u2013 it is missing from the English version:<\/p>\n<pre><code>convertToCp437:\n        ldy     #8\n@loop:  cmp     @from-1,y\n        beq     @found\n        dey\n        bne     @loop\n        rts\n@found: lda     @to-1,y\n        rts\n\n@from:  .byte '@','[','\\',']','{','|','}','~'   ; source: GEOS_de\n@to:    .byte $EB,$8E,$99,$9A,$84,$94,$81,$E1   ; target: CP437\n;             '\u03b4','\u00c4','\u00d6','\u00dc','\u00e4','\u00f6','\u00fc','\u00df'\n<\/code><\/pre>\n<p>The eight character codes in German GEOS that differ from ASCII (line <code>@from<\/code>) are converted to eight codes above 127 (line <code>@to<\/code>).<\/p>\n<p>The destination encoding is <a href=\"https:\/\/en.wikipedia.org\/wiki\/Code_page_437\">Codepage 437<\/a>, the standard (and now obsolete) encoding used by the IBM PC and MS-DOS. That is, except for &lsquo;\u00a7&rsquo;, whose CP437 equivalent would be $15, which is a non-printable character in ASCII-based encodings.<\/p>\n<p>The authors of GEOS were free to choose any encoding \u2013\u00a0it&rsquo;s really just a convention between applications and the printer drivers. But with CP437, drivers for PC printers of the time can just pipe the data through as is.<\/p>\n<h2 id=\"discussion\">Discussion<\/h2>\n<p>Modern software usually comes as a single application binary that supports multiple languages, and with support from the operating system can use different conventions for date\/time and numbers, and uses Unicode to express and work with any character in any script.<\/p>\n<p>geoWrite is running on a 64 KB system and doesn&rsquo;t have the luxury of spending code for any of these features \u2013 all localization differences are compile-time options. This means that in a multi-lingual environment, there are many limitations:<\/p>\n<ul>\n<li>The English version of geoWrite on a German version of GEOS will support umlauts, but can&rsquo;t correctly search for words with umlauts or print umlauts in ASCII mode. Besides, some buttons in the UI will be in German.<\/li>\n<li>The German version of geoWrite on an English version of GEOS will not support umlauts, and the characters &ldquo;@[]{|}~&rdquo; won&rsquo;t print correctly in ASCII mode.<\/li>\n<li>Writing an English document in the German version of geoWrite will use the German date\/time format, use German month names, and can&rsquo;t use decimal tab stops with with a &lsquo;.&rsquo; as the decimal point.<\/li>\n<li>Writing a German document in the English version of geoWrite (on a German GEOS) has the equivalent problems. In addition, searching for words with umlauts won&rsquo;t work correctly, and neither will printing umlauts.<\/li>\n<li>Writing a French or Spanish document with any version of geoWrite works, even with accented letters, as long as only fonts are used where the extra letters are added. But the same limitations with date\/time, numbers, searching and printing apply.<\/li>\n<li>Good luck with CJK and RTL!<\/li>\n<li>Opening a document in a different language version of geoWrite than it was saved in will break either the umlauts or the &ldquo;@[]{|}~&rdquo; characters, as well as data, time and page numbers in headers and footers.<\/li>\n<\/ul>\n<p>Then again, it would be possible to architect a version of geoWrite with more flexibility:<\/p>\n<ul>\n<li>The code for date, time, numbers and the encoding only differs minimally between the localizations, so the app could support all variants, based on a system setting.<\/li>\n<li>The VLIR architecture of GEOS applications allows dividing code and data into an arbitrary number of records, so every VLIR record of the current geoWrite app could be split into two: one with the code, and one with the strings and UI data structures. Which variant of the UI gets loaded depends on the system language.<\/li>\n<\/ul>\n<p>The latter point would of course waste space on disk (regular geoWrite is 35 KB, a 1541 disk holds 165 KB) and increase load times of VLIR records.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In the series about the internals of the geoWrite WYSIWYG text editor for the C64, this article discusses what was required for the German localization. Article Series The Overlay System Screen Recovery Font Management Zero Page Copy Protection Localization \u2190 this article File Format and Pagination Copy &amp; Paste Keyboard Handling Overview Localizing an app &#8230; <a title=\"Inside geoWrite \u2013 6: Localization\" class=\"read-more\" href=\"https:\/\/www.pagetable.com\/?p=1460\" aria-label=\"Read more about Inside geoWrite \u2013 6: Localization\">Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2,5,41,8,15,22],"tags":[],"class_list":["post-1460","post","type-post","status-publish","format-standard","hentry","category-2","category-archeology","category-c64","category-commodore","category-geos","category-operating-systems"],"_links":{"self":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts\/1460","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1460"}],"version-history":[{"count":0,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=\/wp\/v2\/posts\/1460\/revisions"}],"wp:attachment":[{"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1460"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1460"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.pagetable.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1460"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}