Wednesday, July 25, 2007

Writing codas, from Sylhet to Winnipeg

In Greek-based scripts (like Latin or Cyrillic), unless a consonantal letter is followed by a vowel letter, it is assumed not to be followed by a vowel. This seems natural enough if you're used to it; but if you look at it differently, it's rather wasteful. The commonest sound to follow any given consonant is usually a vowel, not another consonant, so if you allow a single letter to represent a consonant plus a vowel you're saving space and effort.

But if you do that, then how do you represent the fact that a consonant is not followed by a vowel? Different writing systems use different solutions. In alphabets that have stuck more closely to their Canaanite prototype, like Arabic, Hebrew, Syriac, or (traditional) Tifinagh, you normally don't bother: a consonant may be followed by a vowel or may not, and you rely on the reader to figure it out. However, sometimes the reader needs additional cues: maybe the word you're writing is obscure, or two words have the same consonants, or it's very important that the text be read exactly right with no possibility of error. In that case, in Arabic, Hebrew, and Syriac, you mark what follows each consonant with a little sign above or below the letter - one sign for "a", say, another for "i", and another to indicate that nothing follows it. Such a sign is necessary if you're still mainly using the system with no vowel marking, because if you left the letter unmarked it would mean not that the letter had no vowel but that what vowel, if any, followed the consonant should be deduced from context.

Typical Indic scripts, such as Devanagari (the script used for Hindi and Nepali), adopt a rather different solution. A consonant letter on its own is to be read with a default vowel, short a ([ʌ]); a consonant followed by a consonant is written as a single "conjunct" letter, formed in any of several ways, but usually by either putting the second letter underneath the first or taking away a line on the right of the first letter and joining it to the second. On the plus side, this yields much of the compactness of a vowel-optional system without any of the ambiguity, and means that each letter is pronounceable on its own; on the minus side, this means fonts have to include a much larger number of letter forms.

Sylheti Nagri is an Indic script formerly (up to the 1950s or so) in use in the district of Sylhet, in eastern Bangladesh. Like Devanagari, it represents consonant-consonant sequences using conjuncts. However, its users were often also familiar with the Arabic script, where letters could be combined into ligatures whether or not they had vowels between them. This may have inspired them to do something rather unusual for an Indic script: develop vowel-consonant conjuncts, such as a+m, a+l, i+n... and consonant-vowel-consonant conjuncts, like pi+r, mo+t... In fact, judging by the examples in the Unicode proposal, it seems that, for at least some historic users, Sylheti did not have a conjunct system at all, just a ligature system.

One very nice solution is that adopted in Canadian Syllabics, the family of writing systems used by a number of Native American tribes in Canada. The name is potentially misleading: I prefer to reserve the term "syllabary" for writing systems like hiragana, where different syllables differ from each other unpredictably. In Canadian Syllabics, for example Cree, the shape of a symbol represents the consonant, while its orientation represents the vowel that follows it, and length or labialisation may be represented by dots. If no vowel follows the consonant, then the base shape is simply written small and superscripted, using the a-orientation, or for labialised consonants the u-orientation.

No comments:

Post a Comment