Wednesday, February 25, 2009

Whys and Wherefores

I suppose I should start with a word of introduction regarding me and my history with the Deseret Alphabet.  My personal involvement with the Deseret Alphabet goes way back to the one linguistics course I took as an undergraduate at the University of Utah, where the instructor brought it up as an example of an interesting local linguistic oddity.  

This would have been in late 1977.  (Excuse me for a minute while I find a quiet corner in which to have a cry about how long ago this was.)  Somewhat over a decade later, I became involved with the Unicode Standard.  In those early days of Unicode, there was a list of potential scripts for encoding circulating among the various Unicodets, and amongst these was listed the “Mormon Alphabet.”  Knowing something about it—including its proper name—I was quick to point out that it was really not an appropriate candidate for encoding, because it was rather thoroughly dead and not much used when it was alive.  

And yet as the 1990’s drew to a middle, Unicode found itself in an awkward position.  The standard had originally been designed to support some 65,000 different characters, but it became apparent that this would not be sufficient.  An architectural change was added in Unicode 2.1 to deal with this, splitting Unicode into the Basic Multilingual Plane (BMP) and sixteen additional planes, each plane supporting the same count of 65,500-ish characters.  

What then happened was a bit of a chick-and-egg problem.  Unicode support was beginning to appear in applications and system software, but it was of the BMP-only sort.  As a result, nobody wanted their script to end up in the astral planes, as the new planes were often called; it wouldn’t be supported by current software. And since the astral planes had no actual content, there was little incentive for anybody to even start the process of implementing support.  

What was needed was a scapegoat or sacrificial lamb:  a script which was arguably a legitimate candidate for encoding but which could live indefinitely as a second-class citizen until software support caught up with it.  

As a result, I started to put together proposals for the encoding of various scripts which would reasonably end up in the Supplementary Multilingual Plane (SMP) of the standard.  There were six, as I recall, and were I sufficiently ambitious I’d look them up.  They included, if memory serves, Etruscan, Linear B, Gothic, Shavian and Pollard.  The sixth was the Deseret Alphabet.  With the exception of Pollard, these are all now encoded, all in the SMP, and work on Pollard is proceeding slowly.  

(In fairness, none of these are actually driving non-BMP Unicode support.  The characters making non-BMP support a sine qua non are from East Asian character sets such as HK SCS and JIS X 0213.  But that would be another blog.)

Actually, Deseret (as it is called in encoding circles) is not an inappropriate candidate for encoding after all.  There is a limited amount of printed material in the Deseret Alphabet, to be sure, but a fair amount of additional material of historical interest exists in manuscript.  More to the point, there are hobbyists who want to use it even now, despite its serious design flaws.  

I am amongst these hobbyists, I’m sorry to say, and have foisted a fair chunk of Deseret material on the world, including this blog.  Now, you may have noticed that this blog isn’t actually in the Deseret Alphabet.  I may or may not add entries in the DA in the future, depending on software support and the amount of time I’m willing to waste on it.  This is more a spot for me to think aloud, as I say, about the technical problems involved in Deseret support and its significance both in LDS culture and in the broader world. 

No comments:

Post a Comment