Friday, February 27, 2009

So, Why Not Use Huneybee, Anyway?



One of the things that surprises me as I poke around the World Wide Web looking for Deseret Alphabet materials is the presence of Huneybee and recommendations that it be used. I have nothing against its design—indeed, I am happy to see new designs of the Deseret Alphabet, particularly if they’re not slavish copies of the font used for the four books printed in the 1860’s—but I will confess that Huneybee’s continued use makes me shudder.

The problem can be summarized in one word, mojibake. Computers, after all, don’t represent text qua text; they represent text via numbers. Text is stored internally via a series of numbers, and the software involved has to somehow map these numbers into something the user can see.

In practice, there are actually three sets of numbers associated with text. The first is the keycode, the number associated with the physical key the user is pressing. The second is the character code, the number used internally to represent a particular character. The third is the glyph ID, which is the index of a particular graphic shape within a font.

Na├»ve users (that is to say, most computer users, especially people whose experience is based on English where text display is obscenely simple) assume that there is a direct mapping between the three. Huneybee is an example of this. You want to see a particular symbol on screen or in print, and you want to generate it by using a particular keystroke. For example, you may want to type shift-S and have 𐐝 show up at the other end. You get this result by using your font-production software by inserting the 𐐝 glyph in the slot currently occupied by S. Done!

All this works if you are generating a text for immediate display and you don’t care what happens down the line, either when you transmit the text to someone else or when you come back in five years and try to edit the text. In order for this to work over space or time, you need to guarantee that the person at the other end has the right font installed and is set up to use it. If not, you get garbled nonsense, mojibake. The 2006 New Deseret Reader illustrates this. It works, if you have Huneybee installed. If not, you get illegible nonsense.

This is actually a serious problem in computer science and is one of the main motivations underlying Unicode. I still have some of the first computer-generated documents I ever made, but I can’t use them anymore. They were written using defunct software with an undocumented internal format on a defunct platform (the Atari ST) using a defunct, proprietary character set. Trajan’s column can still be read effortlessly nearly two thousand years after it was erected, but my own journals from the early 1980’s are illegible.

(I spent a fair chunk of my wasted youth as a secretary in the Molecular Biology department at the university where I did graduate work. We started out with WordPerfect on DOS, which was a very non-WYSIWYG environment. Once, I managed to switch the font to “Greek” to insert some symbols but not switch back and didn’t realize it until I printed a draft and the last two-thirds of the paper in question came out as garbage.

(I should also point out in fairness that I did this kind of thing myself out of laziness when I produced the Deseret Alphabet Triple Combination in 1997. I knew better, but I did it anyway, and I regret it now. I’ve managed to get away with it because the document is a PDF and doesn’t store the text as text, but as glyph IDs for an embedded font, so the data is entirely self-contained. It is, however, impossible for me to take that document and back-convert it to raw text because I don’t have a copy of the font I used anymore. I could probably manage to recreate the encoding, but more likely than not, I’m going to have to do the work all over again.)

The New Deseret Reader, by the way, illustrates another aspect of this problem. Because the Deseret Alphabet has thirty-eight letters in its standard form, and because it uses both upper- and lower-cases, you need room for seventy-six letters, whereas ASCII only has slots for fifty-two. That means that you have to steal slots from punctuation as well as letters, and that means that you can’t use the punctuation yourself. Or Latin letters, for that matter, if you want to intermingle scripts.

There is a natural solution to this, and it comes in two pieces. The first piece is to decouple the characters from the specific font being used to represent them, and this is what Unicode does. It provides a standard way of representing text for dozens of writing systems and thousands of languages which is not tied to a specific font or platform. You still need a font covering the specific language/script in question, of course, but you don’t need to have a specific version of a specific font. Thus Wikipedia’s article on the Deseret Alphabet can contain Deseret Alphabet text and not require you to download and install a specific font before you can do it. You can use any Unicode-savvy Deseret Alphabet font you want. If you’re on a Mac, of course, you’re in luck because every Mac ships with a Unicode-savvy Deseret Alphabet font. If you’re on Windows, you can use James Kass’s excellent Code2001 font.

And Unicode’s Web site can contain a whole page of Deseret Alphabet text and blithely assume that this page will continue to be legible for decades to come on any computer system with an appropriate font installed. And even if a font is not available, the text will be indisputably Deseret and not badly-spelled Latin.

There is a slight trickiness in doing this with somewhat older software which doesn’t support the non-BMP portions of Unicode, but current font editing software and operating systems can do so. Some applications may still be lacking in this area, I’m sorry to say, but that will change over time. (Firefox, for example, doesn’t display Unicode Deseret correctly.)

The other thing you need is a keyboard, that is a way of mapping particular keystrokes into particular characters. All major operation systems have a way of using custom keyboard mappings and editors for these mappings are freely available. Now, there are still issues with making a keyboard for the Deseret Alphabet which I’ll go into at some future point. And yes, you do need to have them installed. Making keyboards, however, is trivial and getting them installed isn’t hard.

This, by the way, is what the Deseret Language Kit did for Mac OS 9. It provided a keyboard, font, and other software pieces necessary to get the Deseret Alphabet to work in a semi-standardized way with any Mac software. It hasn’t been as necessary for Mac OS X, because that’s Unicode-based, which is one reason why I haven’t come out with a successor. I do have a keyboard which I use myself when I want to type Deseret text, ������ ������. I have other techniques for converting lots of text at once, however, which are generally easier to use.

Now, I don’t fault the people who do use Huneybee, because by and large they don’t know better. They haven’t run into the practical problems that made software companies like Apple and Microsoft move towards soft keyboard and Unicode. As such, it’s really a communication problem. It’s the responsibility of people like me who do deal with these issues to educate the public at large.

And one has to allow for the fact that people are people and don’t always do things the right way. After all, I’ve been blithely typing two spaces at the end of every sentence in this blog, even though I know it’s wrong.

But the bottom line is, if you really want to communicate with the Deseret Alphabet, use the standardized techniques which have become available and switch to Unicode. If the owner of Huneybee would like me to create a Unicode-savvy version of it, I’d be happy to oblige, pending the free time to do so.

1 comment:

  1. The old DLK.sea won't open under Leopard. I extracted it via Classic on my trusty Cube and can give you a new Stuffit file if you like.

    ReplyDelete