Saturday, January 5, 2019

A Deseret Ee-ay-ah-aw-ary

One of the more minor challenges I faced when writing up the original proposal to add the Deseret Alphabet to the Unicode standard was that every character in Unicode has to have a unique identifier. They have to be written with upper-case ASCII letters, spaces, digits, hyphens, and nothing else. (It's a bit more complex than that; see http://www.unicode.org/reports/tr31/ for all the gory details.) Identifiers can be descriptive (as in “LATIN CAPITAL LETTER M” for M, or “GREEK CAPITAL LETTER ETA WITH DASIA AND VARIA AND PROSGEGRAMMENI” for αΎ›) or arbitrary (as in CJK UNIFIED IDEOGRAPH-4E95 for δΊ•). They can be misspelled (as in “PRESENTATION FORM FOR VERTICAL RIGHT WHITE LENTICULAR BRAKCET” for ). The two things they absolutely must be are unique and stable. 

This strikes many people as odd. Why can’t something obviously wrong be fixed? The reason is straightforward: character identifiers aren’t meant for end-users. Rather, they’re intended largely for documentation of other standards and specifications. These things can go unchanged for decades—which is practically forever in computer years—and still be in force. As such, it can be more difficult to implement a specification from 2000 if a Unicode character it references changed its name in 2001. The Unicode Technical Committee learned this the hard way.

As for end users and UIs, they can call things whatever they want. Unicode doesn’t care. Much.

By and large, if the people who actually use a particular character have a standard name they like to use for it, that usually provides a good basis for its identifier. 

Now, in the case of the Deseret Alphabet, going by the charts published in the 1860s, there are some wrinkles. 

Deseret Alphabet ChartThe charts start off with six letters which are labeled “Long Sounds”: 𐐀 𐐁 𐐂 𐐃 𐐄 𐐅. These are given the names “e (as in eat)”, “a (as in ate)”, “ah (as in art)”, “aw (as in aught)”, “o (as in oat),” and “oo (as in ooze),” respectively. With one glaring exception, this isn’t too bad. Standard English doesn’t have a word that sounds like “oot,” but otherwise using common words formed by adding a -t sound to the vowel is pretty good. The one problem—and it is a big one—is “art.”

Note to publishers of English phonetic charts: Never illustrate the pronunciation of a vowel by pairing it up with an R. Seriously. Don’t do it. Not only does the English R sound seriously screw up any vowel it happens to follow, but English dialects vary wildly as to when they pronounce an R and when they drop it. I’m going to go out on a limb here, but I’m willing to bet that the New England English of Noah Webster, Brigham Young, Orson Pratt and pals—and George D. Watt’s Manchester English—were non-rhotic and dropped the R in “art.” My Utah English is rhotic and leaves it in.

The next six letters are marked as “Short Sounds of the above:” 𐐆, 𐐇, 𐐈, 𐐉, 𐐊, and 𐐋. They are not given names, but the sounds are marked as “as in it,” “as in et,” “as in at,” “as in ot,” “as in ut,” and “as in book.” Things are starting to look a little weird here. Not only are we dropping our attempts at using real words to illustrate the sounds, but some of the parallels don’t make sense. In modern English the vowel of “it” is not the short form of the vowel in “eat.”

The villain here is the Great Vowel Shift, which, over a comparatively short period of time, shuffled some of the vowel sounds in English.  Prior to the Great Vowel Shift, the long-I sound was indeed the vowel now in “eat,” as is true in most Indo-European languages other than English (and the IPA). The vowel arrangement of the Deseret Alphabet is based on the work of Sir Isaac Pitman, and it’s Pitman who is responsible for doing something which makes sense in terms of general linguistics instead of modern English phonetics. 

We won’t even touch the nasty issue of what sound 𐐉 is intended to represent

When naming these twelve letters, I opted to give them names which emphasize the parallel structure, which meant using names that make more sense for the short vowels than the long ones. As a result we have LONG I, LONG E, LONG A, LONG AH, LONG O, LONG OO, SHORT I, SHORT E, SHORT A, SHORT AH, SHORT O, and SHORT OO. Not the best job possible, I freely admit. I would do it differently now.

The remaining twenty-six letters do not have their organization explained, although they do have a logical order and are given names. 

First are two diphthongs (𐐌, 𐐍), named “i (as in ice),” and “ow (as in owl).” They ended up as AY and OW, just to make the sounds a bit clearer, although DESERET CAPITAL LETTER I AS IN ICE would have been a legal Unicode identifier. 

(In a forty-letter version of the Deseret Alphabet, the letters 𐐦 [OI] and 𐐧 [EW], being diphthongs, would presumably be grouped with these two.)

Next are two semivowels, 𐐎, and 𐐏, named “woo” and “ye” respectively without examples. It’s pretty clear what sounds they’re for; however, why I changed them to WU and YEE is not at all clear. I suppose WU is because that’s the standard spelling these days for the Mandarin pronunciation of Chinese surnames like 吳 and 伍. I have weird priorities sometimes. YEE is probably to make it clear that this is pronounced like the pronoun, with which ye are all familiar; but YE would have done that, too. 

Note that YEE originally came before WU. Apparently it was changed at the behest of Brigham Young, who couldn’t conceive of a reason for them not to be in the same order as their Latin counterparts. 

Then comes 𐐐, which is sui generis, sound-wise. The charts say its name is “h,” but does that mean we should pronounce the name “aitch,” as we are wont to do? I’ll get back to that. In the meantime, the Unicode name is H.

Next come eight “stops.” I have to use quotes here, because modern phoneticians do not classify two of them as stops. The name “stop” comes from the fact that pronouncing them involves completely blocking the airflow.

The first four stops (𐐑, 𐐒, 𐐓, and 𐐔) are named “p,” “b,” “t,” and “d,” with no examples given. It’s not unreasonable to assume that the Deseret names for these letters are the same as the English names of their Latin counterparts. There’s corroboration for that assumption, too. Remember, Deseret has the (stupid) spelling convention that if a word is pronounced the same as a particular letter’s name, that letter can be used for the whole word. I know that using 𐐒 for be (as in “to or not to, that is the question”) and bee (as in Apis mellifera) is attested in printed materials. I’m pretty sure that 𐐓 for tea (as in “a drink with jam and”) is attested. There is a non-zero probability that 𐐑 for pea (as in “eating with a knife”) is out there somewhere in 19th century materials, but I won’t hold my breath for pee (as in “I really need to”), if you see what I mean. 

The next two letters are the stops-that-aren’t, 𐐕, and 𐐖, called “che (as in cheese),” and “g.” The latter isn’t given an example, but it’s used for the J in, well, “John,” and the Latin letter G has the English name  /ˈdΚ’iː/, so I think that it’s clear what’s intended. This is, I should point out, also analogous to the names of the preceding four letters. 

These six in Unicode are accordingly named PEE, TEE, BEE, DEE, CHEE, and JEE. 

The final two stops using a different naming convention. They’re called “k” and “ga as in gate.” The Latin equivalent of the former has the English name /ˈkeΙͺ/, which rhymes with /ˈgeΙͺ/, and that starts out the word “gate.” It would be nice if we had an attestation of 𐐘 for gay (as in “don we now our apparel” or “LGBT”), but we don’t that I know of, and that’s rather a pity.

In Unicode, these are KAY and GAY.

Note that these eight letters are organized into pairs, depending on where the airflow is blocked (by the lips, by the front teeth, and so on), moving from the front of the mouth backwards. Each pair consists of an unvoiced member (pronounced without the vocal cords vibrating) and a voiced member (pronounced with vocal cord vibrations). You can tell the difference between unvoiced and voice consonants if you hold your fingers on your larynx while you speak. 

The eight following letters (𐐙, 𐐚, 𐐛, 𐐜, 𐐝, 𐐞, 𐐟, and 𐐠) are fricatives. They are organized like the stops, by point of articulation (front to back) and in unvoiced/voiced pairs. Fricatives are articulated by forcing the air through a narrow opening rather than stopping it completely. The first and third pair are given names (“f,” “v,” “s,” “z”). The second and fourth pair are given examples, too: “eth (as in thigh),” “the (as in thy),” “esh (as in flesh),” and “zhe (as in vision).” I’m aware of only one of these eight letters being used for a word—or, rather, for two words. 𐐜 is used both for the (the definite article) and thee (the archaic second person singular pronoun).  The former, of course, can be pronounced the same way as the latter even though it usually isn’t—but remember that in the 19th century, Deseret spellings were for the “full” pronunciations of words, the ones we use when speaking very slowly and clearly. 

If we assume that the four with one-letter names have Deseret names like the English names of their Latin counterparts, we get the pattern used by Unicode: EF, VEE, ETH, THEE, ES, ZEE, ESH, ZHEE.

Five letters left. The first two (𐐑 and 𐐒) are termed liquids, and are named “ur (as in burn),” and “l.” It would be nice if there were an attestation of the former being used by itself to write the name of Abraham’s home town; I suppose that might be buried in the Church archives somewhere. I certainly neglected to do it myself when I transcribed the Pearl of Great Price and Old Testament. I have also not used 𐐏 for ye as I should. I hope there aren’t any others. 

The two liquids ended up in Unicode as ER and EL. 

Finally, we have three nasals 𐐣, 𐐀, and π₯, consonants pronounced with the unstinted coΓΆperation of the nose, said to be “m,” “n,” and “eng (as in length).” Unicode calls them EM, EN, and ENG. 

All-in-all, then we have:

Fourteen letters with a one-letter name and no sound exemplar.

Four letters with a one-letter name and sound exemplar. 

Two letters with a multi-letter name and no sound exemplar.

Twelve letters with a multi-letter name and sound exemplars.

Six letters with no name at all but sound exemplars. These are all the short vowels, and it is likely that the intention is that, like the long vowels, their name is the sound they make. 

For thirteen of the fourteen letters with one-letter names and no sound exemplar, it is possible to infer from other evidence that the name is intended to be the same as the English name of the corresponding Latin letter. The exception is 𐐐, which is given the name “h.” So, what to do here?

If we make an exception to the pattern and assume that its Deseret name is not the English name of its Latin counterpart—what is its name?

If we follow the pattern and call it /ˈeΙͺtΚƒ/, then it is the only letter in the Deseret alphabet whose name does not contain the sound it makes. That might seem disastrously bad until one remembers that it its Latin counterpart has that property, too, and is also unique in so doing.

The most obvious thing is to call it /ˈeΙͺtΚƒ/ and live with the weirdness. (Not that we really have to. We don’t have to be bound by the hoary ways of our long-dead ancestors. We can call it whatever we want: “What-you-may-call-um,” or “What-was-his-name,” or maybe “Thing-um-a-jig.” If we want to be rude, we could even resort to epithets like “Candle-ends,” or even—dare we say it?—“Toasted-cheese.” Personally, I like the sound of the name “Huh”. )

There is, fortunately, one other solution. Just as some people call the last letter of the standard English alphabet /ˈziː/ and some call it /ˈzΙ›d/, there are places where the eighth letter is called /ˈheΙͺtΚƒ/, such as Ireland and Newfoundland. Most Australians call it /ˈheΙͺtΚƒ/, too. (I don’t know about New Zealand.)

Well, let’s follow the Irish for once, say I. After all, Michael Everson is Irish. Besides, I’m still mulling over the idea of moving to Newfoundland two years later. /ˈheΙͺtΚƒ/ it is.

So there you have it: thirty-eight letters, thirty-eight names: 

/ˈiː/, /ˈeΙͺ/, /ΛˆΙ‘/, /ΛˆΙ”/, /ˈoʊ/, /ˈuː/,

/ˈΙͺ/, /ΛˆΙ›/, /ˈæ/, /ΛˆΙ’/, /ˈʌ/, /ˈʊ/, 

/ˈaΙͺ/, /ˈaʊ/, /ˈwuː/, /ˈjiː/, /ˈheΙͺtΚƒ/,

/ˈpiː/, /ˈbiː/, /ˈtiː/, /ˈdiː/, /ˈtΚƒiː/, /ˈdΚ’iː/, /ˈkeΙͺ/, /ˈgeΙͺ/,

/ΛˆΙ›f/, /ˈviː/, /ΛˆΙ›ΞΈ/, /ˈðiː/, /ΛˆΙ›s/, /ˈziː/, /ΛˆΙ›Κƒ/, /ΛˆΚ’iː/, 

/ˈʌr/, /ΛˆΙ›l/, /ΛˆΙ›m/, /ΛˆΙ›n/, /ΛˆΙ›Ε‹/

Oh, and what about an actual abecedary? Well, it depends on your definition there. Some dictionaries say that it’s just a written-out list of the letters of the alphabet. Others say that it can be used for an alphabet book or primer. You know, ‘“A’ is for Apple” through “‘Z’ is for Zizzer-Zazzer-Zuzz,” that sort of thing. The former is easy enough. How about the latter?

Well, funny you should ask. It turns out that such a thing isn’t possible. 

First of all, there’s the letter 𐐉. The simple fact of the matter is that it doesn’t occur in modern Utah English at all. It’s just not a sound we make. Ever. I can’t even say it without some effort, and even then, I’m not sure I’m doing it correctly. 

Now, strictly speaking, we don’t distinguish 𐐂 from 𐐃, either, so one or the other of those should go. The difference, though, is that the dictionaries I use for reference do make the distinction between 𐐂 and 𐐃. A number of dialects east of the Mississippi still have not undergone the cot-caught merger. (Or the caught-cot merger, if you prefer.) None of my dictionaries, however—none at all—use 𐐉 anywhere. It’s just that rare nowadays in American English. The father-bother merger is all-but complete.

Beyond that, there are three problematic letters: 𐐋, 𐐠, and π₯. None of these occur in native English words in initial position. 𐐋 doesn’t seem to occur in final position, either. That rather puts a damper on things. Still, it isn’t an insuperable problem. I have seen alphabet books unwilling to use either “X-ray” or “xylophone” for X; and as for “Xerxes,” that’s a rather obscure reference for preschoolers these days. (“Xerxy, Perxy, Turxy, Xerxy, Linxy, lurxy, Great King Xerxes!” Thank you, Edward Lear.) The solution is to use a word with an X in the middle, something like, “X is for existential crisis.” It’s clumsy, but it works, and it gives us the option of “𐐋 is for book,” “𐐠 is for beige,” and “π₯ is for sing.” 

I may some day write an actual abecedarium for the Deseret Alphabet. Meanwhile, I give you, the Deseret Alphabet Song:



The music is Twelve Variations on “Ah vous dirai-je, Maman”, K. 265/300e, by Wolfgang Amadeus Mozart, and we even arranged for his corpse to be playing the harpsichord. (We spared no expense.) The singer is Hatsune Miku, the Japanese superstar. She has some trouble with English vowels, but otherwise we’re pleased with her performance. (For those who find her usual outfit immodest—we’re disappointed in that, too, and wish we could have convinced her to wear something more appropriate, like a longer skirt. She’s a sweet kid, though.)

Tuesday, January 1, 2019

Happy Public Domain Day!

Over the course of my lifetime, copyright law in the United States has undergone a complex evolution. Overall, this has meant that works effectively stopped entering the public domain. In 1979, everything published in the United States in 1921 or earlier was in the public domain. As 31 December 2018, the public domain had grown by exactly one year.

Large corporations, most notably Disney, fought for the last copyright extension back in 1998. Over the past few years, there have been some fears that they'd try for another extension. They didn't. As of today, therefore, anything published in the United States in 1923 is now in the public domain. From now on, each January 1st, another year of material will be available.

More details on what's been going on and what will happen next are available here.

Naturally, since the Deseret Alphabet Classics series has the public domain as its life-blood, this makes a difference to me. As of yesterday, Sir Arthur Conan Doyle's The Case for Spirit Photography was beyond the pale and unavailable for transliteration without express permission. Today it's not. (I know we have all been anxious to read it in the Deseret Alphabet.)

In practice, I don't know that I'll attempt anything published in 1923 over the course of the coming year. Ultimately that's not the point. The purpose of copyright law, as stated in the US Constitution, is "to promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries" (Article I, Section 8). There has been wide-spread feeling that indefinitely long copyrights stifles creativity, rather than encouraging it. Creativity has made one small step forward today. It's "Happy Birthday" all over again.