Thursday, September 3, 2009

Oh, and Some Good News

Now that Snow Leopard is released for Macs, there is good news on the keyboard front. There’s been a long-standing bug in Mac keyboard support that prevented the creation of decent keyboards for the Deseret Alphabet, keyboard which would let you type 𐐟, not by some obscene and hard-to-remember key chord, but by typing S-h. (Or S-H.)

That bug has finally been fixed, and I have a keyboard I’ve been using which takes advantage of the bug fix. You still have to cure yourself of some old Latin typing habits, but it’s a vast improvement on what went before.

It’s not quite ready for release yet; I need to write up documentation. But as soon as that’s done, I’ll post it somewhere appropriate.

Saturday, August 29, 2009

The John T. Morris Headstone


Well, crap.

I happened to be in Cedar City this past week, and we accidentally drove past the Cedar City cemetery while we were there. This called to mind some adventures my wife and I had last summer in the same area.

When people list the materials available in the Deseret Alphabet, one of the items always listed is "a tombstone in Cedar City." At least, I always list it, and that's the way I've always listed it in the past.

My wife and I attended the Utah Shakespearean Festival last summer, and while we were there, I thought it would be a good opportunity to track down this tombstone. So, armed with my iPhone (complete with GPS), we walked north from our hotel towards the cemetery.

Along the way we ran into a local historical museum and we went inside. After all, we didn't know whose tombstone we would be looking for or where said tombstone is. It shouldn't be surprising that it was rather difficult to get help. Most of the people there didn't know what the Deseret Alphabet was. One of them said she'd heard of a tombstone in the cemetery with funny writing on it, but she didn't know where it was. Her sister-in-law, however, could probably help us, so she pulled out her cell phone and called her sister-in-law, who was fortunately available and told us that it was about halfway across the older section of the cemetery. (At this point I don't remember whether or not she said it was towards the road.) She didn't remember whose tombstone it was.

Fortunately for us, it was practically on the road. We found it, rested a bit, took some pictures, then headed back to our hotel.

One thing I had intended was for the pictures taken on my iPhone to precisely identify the location of the tombstone. It wasn't until just now when I checked that I found out that I hadn't turned on that location stamping for pictures when I took them. (Excuse me a minute while I bang my head against the wall.) So I'm going to have to do it the old-fashioned way: with Google maps.

The tombstone is actually a relatively recent replacement for one originally made for John T. Morris, who died 20 February 1855. The original was made of local sandstone and weathered rather badly. Morris was a Welshman and must have been among the earliest settlers in Cedar City, which was itself only founded in its current location in 1855. (It had originally been established at a somewhat different site only four years earlier.) Morris was 27 at the time of his death and died only four days before his infant son, John Walker Morris, aged five months. One presumes that there was an infectious disease that carried them both away.

It is located between 700 and 800 North Main Street in Cedar City, at approximately 37° 41' 24" N 113° 3' 43" W. Actually, I think it's a little to the south of this, but here's a map, anyway:


It is very close to the eastern edge of the cemetery. As I recall, there are practically no tombstones between it and the road. It's large and upright with the writing facing the road. The writing has some interesting features, including a totally unexpected spelling of "John," and an M-glyph that is practically a base clef.

I did try to get a glimpse of it from our tour bus as we drove past, but because I hadn't expected to drive past the cemetery, I hadn't refreshed my own memory of the stone's appearance and so managed to miss it.

I've also put a complete set of pictures my wife and I took up on Mobile Me.

Meanwhile my apologies for my lapse of a year ago in not getting my iPhone properly set up before snapping away.

Monday, July 27, 2009

Alice in Deseret

Some time ago, I received email from Thomas Thurman, mentioning a wiki he’s set up for Shavian. (Yes, I should have written about this as soon as the email came in, and yes, I'm frightfully behind on a lot of stuff at the moment.)

The main reason he was writing to me was to let me know that the site supports conversion to the Deseret Alphabet and, as a demonstration, he’s put up Alice in Wonderland. You can find it at http://tinyurl.com/mxaodx.

There is a glitch in the conversion process at the moment. Shavian uses special letters for vowels followed by an -r sound and these are left unconverted when going over to Deseret. (There is also the problem of the two alphabets being intended to accommodate two different dialects of English: English as spoken by George V for Shavian, and what I believe to be New England English for Deseret.)

Still, it's rather cool to see something other than the LDS Scriptures available in Deseret. The site as a whole is worth keeping an eye on.

Friday, May 8, 2009

The Deseret Alphabet Hits the Big Time (Kind Of)!

I was going to blather on a bit about pronunciation issues, since that’s cropped up in my life this past week, but I have something better to talk about instead.  

The Deseret Alphabet has been in Unicode since version 3.1 of the standard (March 2001), so it’s hardly new there.  And it’s been included in Apple’s Apple Symbols font since Mac OS X 10.3 (October 2003), so it’s hardly new there, either.  

Today, the Deseret Alphabet took the next big step forward.  Associated with Unicode is a second project, the Common Locale Data Repository (CLDR).  A locale in computer parlance is a linking of a place with a language, and it refers to all the standard names for things or standard ways of doing things in that place/language combination.  Locales make it possible for me to specify my place (Salt Lake City) and language (English), and, armed with that information, my computer can set the default names for the months and days of the weeks, the default format to use for dates and times, the default currency, the default units of measurements, and so on.  Of course, I can override these if I choose, but the goal is to make it as unnecessary as possible.  

Version 1.6 was under development a year or so ago, and I spent a couple of evenings madly typing in Deseret Alphabet (and Shavian) data to make Deseret and Shavian locales possible.  Unfortunately, the rules for inclusion in CLDR 1.6 meant that Deseret and Shavian didn’t make it, because I was the only one who had vetted the data.  The rules were relaxed somewhat for version 1.7, however, and with its release today, the Deseret Alphabet can now be used in conjunction with locale information to provide standard information for the computer to use in all kinds of interesting places.  

Now, I don’t know when CLDR 1.7 will start showing up in shipping projects (e.g., Mac OS X Snow Leopard).  It is, however, entirely probable that within a year software you and I and other normal people use will actually be able to use the Deseret Alphabet automatically for things like dates and times.

(I am a normal person, aren’t I?)

Sunday, March 15, 2009

In Which I Answer My Own Question

“Butter” is attested in the Book of Mormon, at 2 Nephi 17:15 and 2 Nephi 17:22.  Unfortunately, the Deseret Third Reader, which I own (the full Book of Mormon being rather to rich for my taste these days), uses the old versification, so it takes a bit of digging.  There it is, though, in ix.6, “𐐺𐐲𐐻𐐯𐑉.” 

A similar word, “utter,” is attested much earlier in the Deseret Third Reader, at I Nephi i.16 (in the old versification), just before Nephi’s statement, much beloved by Seminary students the world over, that his father dwelt in a tent.  “Utter” comes out as “𐐲𐐻𐐯𐑉.”

Now the second vowel here is rather interesting, because that’s not at all how I pronounce either word.  “Little” at I Nephi ii.4 (old versification) comes out “𐑊𐐮𐐻𐑊,” with no vowel marked at all for the second syllable.  To my ear, both words have the same vowel.  

This, however, is one aspect of the English language.  The /r/ phoneme can do funny things to vowels, and without training, it can sometimes be difficult to figure out exactly what it is.  Shavian actually has mandatory ligatures for various vowels followed by /r/, although there is some confusion as to what the intended vowels are.  (Check Wikipedia for details.)  If you were to ask me what vowel is used in the second syllable of “butter,” “utter,” or “little,” I would say it was a schwa—which is another problem.  

English uses the schwa a lot; it’s the most common vowel in the language, largely because English tends to reduce vowels in unstressed syllables to schwa.  We tend to hear it, too, for syllabic consonants, consonants which are syllables all to themselves, as in “little.”  Strictly speaking, although “little” has two syllables, the second syllable has no vowel, even though it sounds like it has a schwa in there.  If, however, you actually pronounce it in full with the schwa you can hear the difference.  

Deseret does have a letter for schwa, 𐐲, and one would naturally expect written materials to be littered with it.  One would also expect that people who sound out words in their own mind to spell words in Deseret (like me) would put in a lot of schwas.  Professional phoneticians wouldn’t have quite so many, and neither would people who get their spellings from the works of professional phoneticians, like Orson Pratt.  Hence “𐑊𐐮𐐻𐑊” with a syllabic consonant, and not “𐑊𐐮𐐻𐐲𐑊.”  

Even worse—and this is my real point today—is that some words will change their pronunciation depending on the level of emphasis.  This is one of the big problems with the Deseret Alphabet.  “The” is actually not a good example, because of the convention in the Deseret Alphabet to spell it using a single letter, 𐑄.  The naïve tendency would otherwise be to spell it with a schwa, 𐑄𐐲, under most circumstances because that’s the sound we make when we aren’t stressing the word.  That, however, is only because we’re reducing the vowel because it isn't stressed.  When the word is emphasized, as above, we use the full vowel and “the” rhymes with “thee.”  (Hence the convention in the Deseret Alphabet, which foolishly allows letter names to be spelled with the letter by itself, as in “𐑄” for “the” and “thee,” “𐐺” for “be” or “bee,” and presumably “𐑀” for “gay,” although I haven’t actually seen that attested in the 19th century materials.)

What this means for overall spelling is that we’re left with a dilemma.  If we really want the Deseret Alphabet to be phonemic, we need to spell words with the full vowel even if what isn’t what we usually say.  Orson Pratt derived his spellings largely from Webster’s dictionary; but dictionaries have the luxury of allowing for multiple pronunciations, and text in the Deseret Alphabet does not.  So in this kind of case, what did Orson do?  I’ll have to look up some examples and check.

Monday, March 9, 2009

So How Do You Pronounce “Deseret,” Anywhere?

Simple question, should have an easy answer. I’ve lived most of my life in Salt Lake City, fourth- or fifth-generation LDS, and between the book store, and the old gym, and the industries amongst others, I’ve heard the word pronounced [dɛzə'rɛt] with absolute consistency. Indeed, the only time I’ve ever heard it pronounced any way was at a Unicode meeting where one of the participants, under the mistaken impression that it was a French word, I suppose, pronounced it [dɛzə'eː].

So one of the great mysteries of the Deseret Alphabet is the fact that it is consistently transcribed as 𐐔𐐯𐑅𐐨𐑉𐐯𐐻 by Orson Pratt. But that brings up the fundamental problem of the Deseret Alphabet, which has different ramifications. The problem is determining how to spell words in the Deseret Alphabet, and the first ramification is the problem of phonetic vs. phonemic.

Linguistics has advanced somewhat in the century-and-a-half since the DA was first bruited, and one distinction that we would now make is between phonetic and phonemic. “Phonetic” is the simpler concept, since it has to do with the sounds we actually make. “Phonemic” is a bit more complicated, in that it has to do with the sounds we are theoretically making.

The word “dogs” is a good illustration of the distinction. We spell the plural here with an -s, even though we make a [z] sound when we say the word. The -s reflects the fact that sound we’re making is theoretically an [s] sound, but the phonetic rules of English don’t allow a pronunciation like [dɔgs] (go ahead, try to say it with an [s]).

On the other hand, there are words like “butter.” Wictionary gives its pronunciation as /'bʌɾ.ɚ/ Now, maybe you can read IPA and maybe you can’t, but one thing seems pretty clear: there isn’t a “t” in there anywhere. Again, this is a side-effect of English phonetic rules, which turn the /t/ phoneme into an alveolar tap (that’s the ɾ-thingie in the middle) in this particular context.

I’ll freely confess that I’m not a linguist of any stripe, let alone a phoneticist, and so my analysis up there may be wrong. In particular, I’m not personally convinced that we really us an /s/ phoneme when we make the plural of “dog,” largely because everybody knows that it’s a [z] sound that’s showing up in actual speech. The alveolar tap in the middle of “butter” is something else, since most people think they’re saying [t]. If they think about it, they may realize it sounds more like a [d]. Only someone with linguistic training would call it an alveolar tap.

On the whole, while the Deseret Alphabet is generally touted as a phonetic alphabet, it actually tends towards the phonemic. English actually uses a lot more sounds than the thirty-eight the Deseret Alphabet can distinguish (as the alveolar tap attests). On the other hand, it consistently uses 𐑆 as the plural for words like 𐐼𐐱𐑀, but as I say, that one has percolated down to the common consciousness. I’m sure that a Deseret Alphabet spelling for “butter” is somewhere attested; it would be interesting to see it.

Friday, February 27, 2009

So, Why Not Use Huneybee, Anyway?



One of the things that surprises me as I poke around the World Wide Web looking for Deseret Alphabet materials is the presence of Huneybee and recommendations that it be used. I have nothing against its design—indeed, I am happy to see new designs of the Deseret Alphabet, particularly if they’re not slavish copies of the font used for the four books printed in the 1860’s—but I will confess that Huneybee’s continued use makes me shudder.

The problem can be summarized in one word, mojibake. Computers, after all, don’t represent text qua text; they represent text via numbers. Text is stored internally via a series of numbers, and the software involved has to somehow map these numbers into something the user can see.

In practice, there are actually three sets of numbers associated with text. The first is the keycode, the number associated with the physical key the user is pressing. The second is the character code, the number used internally to represent a particular character. The third is the glyph ID, which is the index of a particular graphic shape within a font.

Naïve users (that is to say, most computer users, especially people whose experience is based on English where text display is obscenely simple) assume that there is a direct mapping between the three. Huneybee is an example of this. You want to see a particular symbol on screen or in print, and you want to generate it by using a particular keystroke. For example, you may want to type shift-S and have 𐐝 show up at the other end. You get this result by using your font-production software by inserting the 𐐝 glyph in the slot currently occupied by S. Done!

All this works if you are generating a text for immediate display and you don’t care what happens down the line, either when you transmit the text to someone else or when you come back in five years and try to edit the text. In order for this to work over space or time, you need to guarantee that the person at the other end has the right font installed and is set up to use it. If not, you get garbled nonsense, mojibake. The 2006 New Deseret Reader illustrates this. It works, if you have Huneybee installed. If not, you get illegible nonsense.

This is actually a serious problem in computer science and is one of the main motivations underlying Unicode. I still have some of the first computer-generated documents I ever made, but I can’t use them anymore. They were written using defunct software with an undocumented internal format on a defunct platform (the Atari ST) using a defunct, proprietary character set. Trajan’s column can still be read effortlessly nearly two thousand years after it was erected, but my own journals from the early 1980’s are illegible.

(I spent a fair chunk of my wasted youth as a secretary in the Molecular Biology department at the university where I did graduate work. We started out with WordPerfect on DOS, which was a very non-WYSIWYG environment. Once, I managed to switch the font to “Greek” to insert some symbols but not switch back and didn’t realize it until I printed a draft and the last two-thirds of the paper in question came out as garbage.

(I should also point out in fairness that I did this kind of thing myself out of laziness when I produced the Deseret Alphabet Triple Combination in 1997. I knew better, but I did it anyway, and I regret it now. I’ve managed to get away with it because the document is a PDF and doesn’t store the text as text, but as glyph IDs for an embedded font, so the data is entirely self-contained. It is, however, impossible for me to take that document and back-convert it to raw text because I don’t have a copy of the font I used anymore. I could probably manage to recreate the encoding, but more likely than not, I’m going to have to do the work all over again.)

The New Deseret Reader, by the way, illustrates another aspect of this problem. Because the Deseret Alphabet has thirty-eight letters in its standard form, and because it uses both upper- and lower-cases, you need room for seventy-six letters, whereas ASCII only has slots for fifty-two. That means that you have to steal slots from punctuation as well as letters, and that means that you can’t use the punctuation yourself. Or Latin letters, for that matter, if you want to intermingle scripts.

There is a natural solution to this, and it comes in two pieces. The first piece is to decouple the characters from the specific font being used to represent them, and this is what Unicode does. It provides a standard way of representing text for dozens of writing systems and thousands of languages which is not tied to a specific font or platform. You still need a font covering the specific language/script in question, of course, but you don’t need to have a specific version of a specific font. Thus Wikipedia’s article on the Deseret Alphabet can contain Deseret Alphabet text and not require you to download and install a specific font before you can do it. You can use any Unicode-savvy Deseret Alphabet font you want. If you’re on a Mac, of course, you’re in luck because every Mac ships with a Unicode-savvy Deseret Alphabet font. If you’re on Windows, you can use James Kass’s excellent Code2001 font.

And Unicode’s Web site can contain a whole page of Deseret Alphabet text and blithely assume that this page will continue to be legible for decades to come on any computer system with an appropriate font installed. And even if a font is not available, the text will be indisputably Deseret and not badly-spelled Latin.

There is a slight trickiness in doing this with somewhat older software which doesn’t support the non-BMP portions of Unicode, but current font editing software and operating systems can do so. Some applications may still be lacking in this area, I’m sorry to say, but that will change over time. (Firefox, for example, doesn’t display Unicode Deseret correctly.)

The other thing you need is a keyboard, that is a way of mapping particular keystrokes into particular characters. All major operation systems have a way of using custom keyboard mappings and editors for these mappings are freely available. Now, there are still issues with making a keyboard for the Deseret Alphabet which I’ll go into at some future point. And yes, you do need to have them installed. Making keyboards, however, is trivial and getting them installed isn’t hard.

This, by the way, is what the Deseret Language Kit did for Mac OS 9. It provided a keyboard, font, and other software pieces necessary to get the Deseret Alphabet to work in a semi-standardized way with any Mac software. It hasn’t been as necessary for Mac OS X, because that’s Unicode-based, which is one reason why I haven’t come out with a successor. I do have a keyboard which I use myself when I want to type Deseret text, ������ ������. I have other techniques for converting lots of text at once, however, which are generally easier to use.

Now, I don’t fault the people who do use Huneybee, because by and large they don’t know better. They haven’t run into the practical problems that made software companies like Apple and Microsoft move towards soft keyboard and Unicode. As such, it’s really a communication problem. It’s the responsibility of people like me who do deal with these issues to educate the public at large.

And one has to allow for the fact that people are people and don’t always do things the right way. After all, I’ve been blithely typing two spaces at the end of every sentence in this blog, even though I know it’s wrong.

But the bottom line is, if you really want to communicate with the Deseret Alphabet, use the standardized techniques which have become available and switch to Unicode. If the owner of Huneybee would like me to create a Unicode-savvy version of it, I’d be happy to oblige, pending the free time to do so.

Wednesday, February 25, 2009

Whys and Wherefores

I suppose I should start with a word of introduction regarding me and my history with the Deseret Alphabet.  My personal involvement with the Deseret Alphabet goes way back to the one linguistics course I took as an undergraduate at the University of Utah, where the instructor brought it up as an example of an interesting local linguistic oddity.  

This would have been in late 1977.  (Excuse me for a minute while I find a quiet corner in which to have a cry about how long ago this was.)  Somewhat over a decade later, I became involved with the Unicode Standard.  In those early days of Unicode, there was a list of potential scripts for encoding circulating among the various Unicodets, and amongst these was listed the “Mormon Alphabet.”  Knowing something about it—including its proper name—I was quick to point out that it was really not an appropriate candidate for encoding, because it was rather thoroughly dead and not much used when it was alive.  

And yet as the 1990’s drew to a middle, Unicode found itself in an awkward position.  The standard had originally been designed to support some 65,000 different characters, but it became apparent that this would not be sufficient.  An architectural change was added in Unicode 2.1 to deal with this, splitting Unicode into the Basic Multilingual Plane (BMP) and sixteen additional planes, each plane supporting the same count of 65,500-ish characters.  

What then happened was a bit of a chick-and-egg problem.  Unicode support was beginning to appear in applications and system software, but it was of the BMP-only sort.  As a result, nobody wanted their script to end up in the astral planes, as the new planes were often called; it wouldn’t be supported by current software. And since the astral planes had no actual content, there was little incentive for anybody to even start the process of implementing support.  

What was needed was a scapegoat or sacrificial lamb:  a script which was arguably a legitimate candidate for encoding but which could live indefinitely as a second-class citizen until software support caught up with it.  

As a result, I started to put together proposals for the encoding of various scripts which would reasonably end up in the Supplementary Multilingual Plane (SMP) of the standard.  There were six, as I recall, and were I sufficiently ambitious I’d look them up.  They included, if memory serves, Etruscan, Linear B, Gothic, Shavian and Pollard.  The sixth was the Deseret Alphabet.  With the exception of Pollard, these are all now encoded, all in the SMP, and work on Pollard is proceeding slowly.  

(In fairness, none of these are actually driving non-BMP Unicode support.  The characters making non-BMP support a sine qua non are from East Asian character sets such as HK SCS and JIS X 0213.  But that would be another blog.)

Actually, Deseret (as it is called in encoding circles) is not an inappropriate candidate for encoding after all.  There is a limited amount of printed material in the Deseret Alphabet, to be sure, but a fair amount of additional material of historical interest exists in manuscript.  More to the point, there are hobbyists who want to use it even now, despite its serious design flaws.  

I am amongst these hobbyists, I’m sorry to say, and have foisted a fair chunk of Deseret material on the world, including this blog.  Now, you may have noticed that this blog isn’t actually in the Deseret Alphabet.  I may or may not add entries in the DA in the future, depending on software support and the amount of time I’m willing to waste on it.  This is more a spot for me to think aloud, as I say, about the technical problems involved in Deseret support and its significance both in LDS culture and in the broader world.