Learn Arabic for Beginner: historical linguistics

Showing posts with label historical linguistics. Show all posts

Saturday, November 21, 2009

Songhay and Nilo-Saharan

Following up on the preceding post, I've been looking at Greenberg's (1966) Nilo-Saharan comparisons - specifically, the 29 ones involving Songhay that have reflexes in Kwarandzyey, the Songhay language least likely to be involved in recent contact with Nilo-Saharan. Of these, 20 have comparanda in Saharan (Kanuri/Kanembu + Teda/Daza + Berti + Beria/Zaghawa), 17 in Eastern Sudanic (Nubian, Nilotic, Surmic, etc.), vs. a maximum of 13 for any other branch. (At least 7 also have plausible Mande comparisons.) Now, Saharan only consists of about 4 languages (9 by Ethnologue standards.) For Eastern Sudanic, excluding Kuliak, the Ethnologue counts 103 languages, and a huge amount of internal diversity. If Songhay were equally distant from the whole of Nilo-Saharan, you would expect far more cognates with Eastern Sudanic than with Saharan; the figures suggest that the link (whatever its nature) is primarily with Saharan, and only secondarily, if at all, with the rest of the languages he classified as Nilo-Saharan.

The grammatical comparisons that Greenberg offers are interesting but not compelling; there are only 10 of them (only 4 with Kwarandzyey reflexes), and they often incorporate misrepresentations (as Lacroix noted, for example, -ma forms verbal nouns, not relatives/adjectives, and 1sg ay < *agay, reducing the similarity to forms like Zaghawa ai.) Some of the lexical ones, however, are rather good; similarities such as Koyraboro Senni kokoši “scale (of fish)” = Manga Kanuri kàskàsí “scale (of fish)” cry out for explanation, and, though quite rare, look sufficiently numerous that chance seems unlikely. But whether they should be explained by contact or borrowing remains unclear. Either scenario would be historically interesting, since at present rather a large expanse of Tuareg and Hausa-speaking land separates Songhay from even Kanuri, and Saharan originated closer to modern-day Darfur than to Lake Chad.

Monday, October 19, 2009

Arabic loanwords in "proto-Nilo-Saharan"

Ehret 2001 (or see Nostratic.ru) looks at first sight like an astonishingly detailed reconstruction of Nilo-Saharan, with nice binary splits and loads of technology-related words for archeologists and anthropologists to sink their teeth into. Why shouldn't specialists take advantage of this amazing opportunity to correlate historical developments to linguistic ones?

I just found a handy answer to that question. Bender (1997:175ff) gives the 15 cognate sets in Ehret 2001 that are represented in the most sub-families of Nilo-Saharan. 3 of the 15 look distinctly like Arabic loans.

1387 *wàs “to grow large”: Fur wassiye “wide” and Songhay wásà “to be wide” are both from Arabic wāsi`- واسع. The other items cited – Ik “stand”, Kanuri “yawn”, Kunama “increase, augment”, and Uduk “to tassel, of corn” – are scarcely obvious candidates for being related to one another in the first place.

1297 *là:l “to call out (to someone)”: Kanuri làn “to abuse, curse” and Songhay láalí “to curse” are obviously from Arabic la`an- لعن; Kunama lal- “to denigrate” might be from the same source. That only leaves Uduk “to persuade, incite to do something” and Proto-Central-Sudanic “to call out”.

718 *t̪íwm “to finish, complete”: almost certainly Songhay tímmè “to be finished”, very likely Uduk t̪ím “to finish”, Ocolo t̪um “to finish”, and maybe even Fur time “total”, are from Arabic tamm- تمّ (impf. -timm-), as Bender (ibid:177) considers probable. That leaves Proto-Central-Sudanic, Kunama, and Maba “all”, Kanuri “ideophone of dying animal” (!), and Proto-Kuliak “buttocks”. The “all” set looks rather promising – the whole etymology, not so much.

There are plenty of other Arabic loanwords in Ehret's “Proto-Nilo-Saharan” – a particularly egregious example is Kanuri zàmzàmíyɑ̀ “leather bottle-shaped water vessel for journeys” (#1223 *zɛ̀m “to become damp, moist”), and other especially clear-cut cases include #1173 < sawṭ, #1185 < šamm – but the fact that they include a significant proportion of the best cognate sets is what really strikes me. If a reconstruction attempt can't distinguish a widely distributed recent loan from a cognate set that split more than eleven thousand years ago, any information it gives about readily diffused items like technologies is completely unreliable. For another review from a similar perspective, try Blench 2000 (not sure why it appeared a year before the book's nominal publication date...)

The more I read about Nilo-Saharan, the less convinced I am that it exists (much less that Songhay belongs to it.) That means the classification of the languages of quite a lot of Africa is basically up for grabs. It would be great to have a reexamination of the area.

Saturday, June 13, 2009

Open to interpretation

Songhay's lexical economy - the way it keeps its lexicon rather smaller than its neighbours' by using a single word to fulfill the functions of what in most languages would be several different words - has attracted the attention of several of those who have written about the language from the 1850s onwards. While Kwarandzyey (Korandje) is so full of Berber and Arabic loanwords that the size issue probably no longer applies, it still has many striking examples of polysemy. Take "open", for example.

fya (from Songhay *feeri) is best translated as "open" (its commonest sense). Of course, to open one's mouth can be to start eating - hence the frozen compound fya-mmi "open-mouth" means "breakfast". But opening is also what you do to release something from an enclosed space; hence to "open water (for something)" (fya iri), or just "open", is to irrigate, and to "open for an animal or person" is to release them. Likewise, to "open a rope (for something)" is to untie it. To release something from your grasp is to let it fall - hence to "open for something" is also to drop it. And for a man to release his wife from her obligations towards him is to end the marriage - hence to "open for a woman" is to divorce her.

We can map the connections between these easily enough, making it clear that they form a coherent network of meaning:


breakfast untie
    \    /    \
     open - release
       \      / \
       irrigate divorce

But not only will any single English translation applied literally and consistently yield ludicrous results for at least some of these cases - translating it differently in different circumstances will force you to choose a single meaning in cases where the text is ambiguous. "He opened for the woman" probably means he divorced her, but in principle it could mean he released her (eg from prison), or untied her, or (literally) dropped her; in fact, since Songhay has no gender distinctions in pronouns, it should even be able to mean "It (eg an automatic door) opened for her". And of course, this kind of ambiguity can be deliberately exploited for effect, as in puns.

In Kwarandzyey, this is never likely to cause serious ambiguity - the language is almost never written down, and it's a small enough community that the context is usually known to everyone anyway. But imagine worrying about this kind of thing in a millennia-old text in a language that no one today speaks natively, and you can really see why even the most literal translation of such a text is unavoidably an act of interpretation.

Friday, June 5, 2009

Why dead snakes are like clothes

What would you say if, in some science-fiction novel, you read of a language where the situations that in English would be described as "The clothes blew down from the clothesline", "Push that dead snake away with a stick", and "I see where he's carrying the rabbits he killed hung from his belt" were all naturally expressed with the same root, plus nothing more than different affixes? What about "I slammed together the hunks of clay I held in either hand", "I slung away the rotten tomatoes, sluicing them off the pan they were in", and "I picked up in my mouth the already chewed gum from where it was stuck on the table"? My inclination would have been to dismiss it as a neat but implausible idea, placing some strain on the reader's suspension of disbelief. But - until no more than thirty years ago - such a language existed right in California. Go to Part III of Leonard Talmy's dissertation Semantic Structures in English and Atsugewi to get the data; here's a slightly less surprising example as a taster:

s-'-w-	cu-	lup-	hiy-ik:-	a
Subject=I, Object=3rd person	from a linear object moving axially [with one end] non-obliquely against the FIGURE	for a small shiny spherical object to move	out of a snug enclosure/a socket	factual
I poked his eye out (with a stick.)

s-'-w-	pri-	lup-	nik-iy-	a
Subject=I, Object=3rd person	from the mouth/interior of a person, working ingressively, acting on the FIGURE	for a small shiny spherical object to move	all about, here and there, back and forth	factual
I rolled the round candy around in my mouth.

Of course, people are people; after explanation, the similarities are easy enough to make out, and presumably given enough time anyone can learn to look at a situation and decompose it into elements like these, rather than the elements that "leap out" at an English speaker. In fact, I suspect that having to learn to see things the way the people you talk to do is one of the subtler drivers behind contact-induced language change. But cases like this provoke thought: just how much can the attributes of a situation most relevant to formulating a sentence vary from language to language?

Wednesday, March 11, 2009

Arabic (and Berber?) loanwords in southern Italy

Just came across a little monograph on Arabic and Berber loanwords in the dialects of the Basilicata (southern Italy): Sopravvivenze lessicali arabe e berbere in un'area dell'Italia meridionale, la Basilicata by Luigi Serra. Most of the loans listed are from Arabic, some quite obvious (eg taūt "coffin" < تابوت, źir "a copper or terracotta container for liquids" < زير, zammîl "big pannier with which various goods are transported on a beast of burden's back" < زنبيل), others rather less clear-cut.

Only three loans (and one placename) are claimed as from Berber. Two of them look acceptable, but all of them seem questionable, and they all refer to objects that there would have been no obvious reason to borrow terms for. It's possible that Berber influence can be found in southern Italian dialects, but this doesn't present a terribly convincing argument. Still, here they are:

źembr / źimbr / zimr / źimmr "billy-goat" (caprone, becco) < pan-Berber izimmər "ram", p. 39. (Looks good, but why the shift in species? - Also, see comments for an alternative Greek etymology.)
aččáta "big meal" (scorpacciata, mangiata, spanciata) < pan-Berber əčč "eat", p. 11. (The semantic and phonetic match are great, but the word is so short that coincidence seems hard to rule out.)
šéḍḍa "wing" (ala) < Zenati Berber "bird", eg Siwi ašṭiṭ, p. 26. The author mentions an alternative possibility - deriving it from Italian ascella "armpit" - that seems much more plausible.
Zaza (placename) < Berber azəzzu "thorny broom (plant sp.)" - not discussed in any detail (author cites Renisio), p. 41.

Wednesday, March 4, 2009

No, Berber isn't descended from Arabic

A few days ago I got lent a copy of a recent book in Arabic by Othmane Saadi: Dictionary of the Arabic Roots of Amazigh (Berber) Words معجم الجذور العربية للكلمات الأمازيغية (البربرية) (Tripoli: Academy of Arabic Language 2007.) My reaction, in brief, is that it's unscientific jingoistic claptrap. But I happen to have friends (not linguists, of course) who take it seriously; and I am told that the author, a proud member of the Chaoui Berber Nememcha (Nmamša) tribe, genuinely believes his own theory. I will therefore try to explain as simply as possible where the book goes wrong.

His starting point is noting the existence of strong similarities between Arabic and Berber in the vocabulary and grammar (p. C: “90% of Amazigh Berber words are pure or Arabised Arabic, and the grammar of Berber agrees with the grammar of Arabic.”) This is substantially correct, and has been known for a long time (see, for example, Igor Diakonoff's Afrasian Languages, Moscow: Nauka 1988, or at a more basic level one of my first posts), except that 90% is a substantial exaggeration – many of the comparisons he puts forward are at best questionable, as will be seen below. But he claims that the explanation for these similarities is that Berber descends from Arabic. Not just Berber either, as he says on p. B: “The term Arabitic عروبية means the ancient Arabic languages which are wrongly called the Semitic languages and which branched out from the source language Arabic thousands of years ago, such as Babylonian, and Assyrian, and Akkadian, and Phoenician Canaanite, and Aramaic, and Himyaritic, and Sabaean, and Thamudic, and Lihyanite, and Ma'inic, and ancient Egyptian, and Berber, and others.” Linguists subscribe to a rather different explanation for the observed similarities: that Berber and Arabic (and all the other languages he listed, and many he doesn't list such as Hausa and Somali) are all descended from a single language, called for convenience Proto-Afroasiatic (Greenberg 1950), which was different (and probably about equally different) from any of them.

How would you choose between these two hypotheses? Well, if the original language was different from Arabic, then you would expect some original forms to have been lost in Arabic but kept in other languages. Oddly enough, Saadi himself gives evidence for exactly that: he links the Berber ur “not” to Akkadian ul (p. 12), and the Berber -as “to him/her” to Akkadian -šu (p. 12), and the Berber nəkk “I” to Ancient Egyptian ink and Akkadian 'anāku, none of which are attested in Arabic. Unless you believe that Akkadian and Berber each independently invented the same new forms, or that they are more closely related to each other than to Arabic – which Saadi (correctly) does not claim – you have to conclude that the common ancestor of Arabic and Berber included words like ur/ul for “not”, and 'anāku for “I”, and so on, and hence was different from what we know as Arabic, just as it was different from Berber.

So maybe this common ancestor was Arabic in a different sense: Saadi argues that it was originally spoken in Arabia, so Arabic would be the one language that stayed at home, and presumably got less affected by foreign influence. Unfortunately, he doesn't have much of a case. His first argument (p. 1) is frankly risible: “Europe and North Africa were covered with ice before [18000 BC], whereas the Arabian peninsula enjoyed a climate similar to that of southern Europe now. The ice melted in the former and drought hit the latter, so mankind left the Arabian peninsula and settled North Africa and southern Europe.” The quote he cites on this actually says nothing about North Africa, and for good reason: even at the last glacial maximum North Africa was never covered by ice (see map), and was if anything more habitable before 18000 BC than it is now. He also notes (p. 2) that Berber princes have long claimed Yemenite origins. Such claims are questionable for many reasons (the desire for prestige, the originally matrilineal traditions of many Berber tribes, and no pre-Islamic attestations) – but even if true, it would prove nothing about the language: people change their language all the time without changing their ancestry, as any emigrant can tell you. The rest of his argument is a hotchpotch of miscellaneous quotes which at best claim that various early North African peoples or languages or cultures originated in the Middle East; in a particularly ludicrous case, he blithely quotes Bousquet (1957) to the effect that the Berber language “came from Asia Minor” [Turkey!] None of these quotes so much as mention the Arabian peninsula.

In fact, the linguistic evidence means that Proto-Semitic may well have been spoken in Arabia and certainly was spoken in the Middle East, but the common ancestor of Berber, Egyptian, and Semitic was most likely located in Africa. You see, as noted above, these three language families are also quite closely related to Chadic (spoken mainly in Nigeria and Chad) and Cushitic (spoken around the Horn of Africa) – which means that 4 out of 5 branches of this family are native to Africa. It is more likely that one branch left Africa than that 4 branches each separately followed the same narrow path across Sinai or crossed the Red Sea. (For theoretical background, see Campbell 2004.)

In other words: whether the similarities this book gathers between Arabic and Berber are valid or not, they don't do anything to support the author's claim that Berber descends from Arabic. Do they at least have the merit of being valid comparisons? Sometimes, but not with any consistency. Many of his comparisons look rather far-fetched, eg on p. D:

taməṭṭuṯ “woman” < Ar. ṭāmiṯ طامث “menstruator”
argaz “man” < Ar. rakīza(tu l-'usrā) ركيزة الأسرى “pillar (of the family)”
ixəf “head” < Ar. xf' خفأ “appear”, because the head stands out
tadaγt “armpit” < Ar. daγdaγah دغدغة “tickling”
alγəm “camel” < Ar. luγām لغام “the foam that comes out of camels' mouths”

Many others are clearly genuine loanwords, often featuring sounds that cannot be reconstructed for Proto-Berber, though I don't think many of these are original suggestions, eg:

(p. D) axərraz “cobbler” < Ar. xaraza خرز “to sew leather”
(p. H) abrid “road” < Ar. barīd بريد (confirmed by the Tuareg pronunciation of this word, abărid)
(p. 38) ləbṣəl “onion” < Ar. baṣal بصل (Siwi happens to preserve an older word for "onion": afəllu)
(p. 78) taħzamt “belt” < Ar. ħizām حزام

A couple are known Phoenician loanwords:

(p. 57) agadir, ažadir "wall" - Ar. jidār جدار

A few are well-known Afroasiatic cognates, and scattered among them may be other valid cognates:

(p. 250) iləs “tongue” - Ar. lisān لسان
(p. 110) iđammən “blood” - Ar. dam دم
(p. 292) tiqqad “burning” - Ar. wqd وقد

But the book makes no attempt to distinguish between words taken from Arabic comparatively recently and words inherited from the common ancestor of Berber and Arabic, and seems to assume that any word found in both dialectal Arabic (Darja) and Berber must automatically be originally Arabic, rather than possibly being a borrowing from Berber into Arabic. There is a well-known technique for sorting out inherited cognates from loanwords from coincidental similarities: sound correspondences. Sounds don't usually change at random: they change systematically, just as all j's in Egyptian Arabic become g. You establish which Berber sounds normally correspond to which Arabic ones under what circumstances, based on looking at what happens in the clearest cases; that gives you a standard by which to judge the doubtful ones. Saadi has made no effort to do this, and the unfortunate result is that in his comparisons the chaff far outweighs the wheat.

Berber and Arabic both descend from the same language, but that language was neither Berber nor Arabic, and probably didn't come from Arabia - and if you want to know about that common source, then you'll learn more from the works of Diakonoff or Greenberg, or even from more problematic sources like Orel and Stolbova 1999 or Militarev's online database, than from Saadi 2007.

Friday, February 13, 2009

Why do historical linguistics?

Unraveling the details of a given language family's history is painstaking, detail-oriented work - comparing hundreds or thousands of words to each other, looking through different languages' grammars, coming up with hypotheses to explain what you see and hoping the next language you look at doesn't disprove them... Why do it?

Well, for one thing, you end up showing interesting things about the history of the relevant part of the world, often things it would be hard or impossible to show any other way - that Madagascar was settled by people from Borneo, for example, or that Ijo slaves from Nigeria ended up on the Berbice River in Guyana, or that Persians and Swedes (along with a lot of other people!) ultimately both got their language from a common source. But that depends on your being interested in a particular region; why would a person working on the historical linguistics of (say) the Sahara care about the historical linguistics of New Guinea, or Alaska, or even Europe?

It's because people are pretty similar everywhere - we all have roughly the same mouths and the same brains, and as a result we all tend to make roughly the same kinds of changes. Looking at changes in the languages of Europe, and at which direction they went, turns out to give you a pretty good idea of what kind of changes to expect in New Guinea - and vice versa; wherever you go, k is much more likely to change to g than to n, and a word meaning "want" is much more likely to become a future tense marker than a word meaning "jump".

That means that all these individual small-scale studies are so many pieces fitting together to form a map of how language works. Describing a language (no mean challenge in itself) shows you one set of possibilities; typology tells you the possible states of a language; but historical linguistics relates them to one another, showing you which states are closely linked and which are not. You can't predict what will happen to a language, but you can see in advance what kind of changes are likely and what kind are unlikely.

For sounds, this map of changes - this network linking different states of a language to one another - will seem familiar; it corresponds closely to articulatory and/or auditory similarity. You can mostly account for it by knowing how different sounds are made (with the lips, the tongue, etc...) and which sounds are hardest to distinguish. The key test for a theory of syntax (as far as I'm concerned) is whether it can account similarly for the attested map of syntactic change.

Sunday, December 28, 2008

Siwa and its significance for Arabic dialectology

Hope all my readers are having/have had a great holiday.

A paper of mine, "Siwa and its significance for Arabic dialectology", should (inshallah) be appearing in ZAL soon-ish. Basically, there's a whole lot of Arabic influence on Siwi, including things you wouldn't expect to be borrowed, like Arabic's rather unusual method of forming comparatives from adjectives. However, this influence shows clear signs of deriving, not from any dialect currently used in or even particularly near Siwa, but rather from a more archaic one, with some resemblance to the dialects of other Egyptian oases quite distant from it and some features not attested in any other Arabic dialect of Egypt or Libya. In the 1100s, according to al-Idrisi, Siwa was inhabited both by Berbers and by sedentary Arabs; I suspect that the Arabs got assimilated into the larger Berber community and that much of the Arabic element of Siwi derives from their now-extinct dialect. If this sort of thing interests you, have a look (you can download it from the link at the beginning of this paragraph) and please feel free to comment on it here or by email.

Saturday, October 11, 2008

Berberologie colloquium at Leiden

I've spent the past couple of days at the Berberologie colloquium in Leiden, and it's been great fun. There were plenty of very interesting speakers, but for me two languages stole the show: Tetserrét and Ghomara.

Tetserrét (discussed by Cécile Lux) is spoken by a Tuareg tribe, the Ayt-Tawari, in Niger. But it's not linguistically Tuareg at all - its closest relative is Zenaga, the Berber of Mauritania (not northern Berber, contrary to Wikipedia), and Tuaregs can't even understand it. It seems to be an isolated survival of the Berber language spoken in the region before the Tuareg got there. It's not in Ethnologue either. (Taine-Cheikh's new Zenaga dictionary is out, by the way, and was selling as fast as a book reasonably can in a conference of twenty people.)

But Ghomara, in northern Morocco, is something else. Across Berber, borrowed Arabic nouns typically behave like in Arabic (keeping their Arabic plurals, and not changing for case.) In Ghomara (discussed by Jamal El Hannouche), Arabic adjectives take Arabic rather than Berber agreement marking - and even some Arabic verbs get conjugated fully in Arabic, not in chance code-switching but regularly by all speakers, and up to and including pronominal object suffixes. It's not quite unprecedented worldwide, but that level of contact influence is pretty darn rare.

I didn't put Tadaksahak in the first paragraph because it's much less unfamiliar to me, but Regula Christiansen's paper on that had some interesting implications. Basically, Tadaksahak has all but lost the Songhay method of forming attributive adjectives; instead, it's substituted a simplified version of the Tuareg one (suffixing -an), which has become productive for Songhay adjectives too. The funny part is this: Songhay has a lot of CVC adjectives (stative verbs). Tuareg doesn't really do CVC adjectives; it prefers longer words. So when you add the -an to these, you typically reduplicate the adjective. For example, kan "be sweet" > kankanan "sweet". This comes worryingly close to invalidating a conjecture I had made on the borrowability of templatic morphology (but not quite!)

My own paper established that much of the Berber element of Kwarandzyey derives from an extinct close relative of Zenaga. In effect, the "Western Berber" genetic subgroup of Berber has four members: Zenaga itself (finally with a decent dictionary), Tetserrét (awaiting further publications), the large Berber element of Hassaniya, and part of the proportionally larger Berber element of Kwarandzyey.

Monday, December 31, 2007

Climate change, etymology, and speaker population

A quick Google search turns up a number of theories on the etymology of the name Tabelbala, none of which correspond to the one that old men here tell me, which appears to me to be much the most plausible. The oasis' name is Tsawerbets in Kwarandzie, Tabelbalt in local Tamazight, and Belbala in local Arabic; they derive it from a tree called awerbel in Kwarandjie and belbal in local Arabic, that used to be common but (presumably due to the lower water table) no longer grows here. [e=schwa] It turns out that belbal is fairly widespread in North African Arabic, and refers to a type of pine; it's also attested in Taznatit, as abelbal. The normal Berber diminutive gives tabelbalt, and the usual Kwarandjie shift of l>r and t>ts would give tsaberbelts; intervocalic b>w is irregular, but I have heard it in other contexts, and final clusters tend to be simplified, which would give tsawerbets. Berber diminutive morphology is not productive in Kwarandjie, so it's hard to imagine this being a folk etymology. If this is correct, the very name of the oasis, like its many acres of ruins and its hundreds of dried-up foggaras, is a mute testimony to a time not too long ago when it was much greener and wetter.

At the moment, Kwarandjie turns out to have roughly on the order of 3000 speakers, adding up the populations of the three villages as given to me by a local official (himself a speaker) and assuming the minority that doesn't speak it at all is made up for by all the emigrant speakers in Tindouf and Bechar. This represents about half the population of the oasis; the other half is in el-Kartsi (le Quartier), the newer town centre. Despite the endangerment discussed in the previous post, this is larger than it's been at any point since 1908, when Cancel counted barely 500 or so speakers. But even in Cancel's time most of the foggaras were dry, and a few centuries earlier refugees had fled the area for places like Mlouka and Ktaoua; in earlier periods the number of speakers may have been significantly larger, judging by the ruins of their houses, which seem to cover an area rather larger than the present settlements do. That former climate might help explain why the oasis not only kept a language that has remained practically nowhere else in the thousand kilometers between it and Timbuktu, but also kept much more Songhay vocabulary than the other northern Songhay languages - even words like hawi "cow", referring to items currently totally absent from the oasis, or tsyu "read" and genga "pray", referring to concepts strongly associated with Arabic. The historic decline in the oasis's population and prosperity has surely itself had its effect on the language, letting words associated with particular specialties (perhaps silverwork, for example) to vanish for lack of customers to sustain them, or ones for species to vanish with their referents (as the word asiyed, "ostrich", has nearly finished doing - I've only found one speaker who knew it, although Champault confirms it). But is there any way to prove the existence of such an effect, or measure it?

Friday, August 10, 2007

A coming reanalysis in Arabic and Berber

In historical linguistics, when a word or string of words is reinterpreted as consisting of a different set of words (for example, when "an ewte", which is what people used to say in Middle English, becomes "a newt"), they call it reanalysis. Here are two somewhat parallel examples.

In classical Arabic, one word for "he came" is jā'a. "With" is bi-. "He came with X" is jā'a bi-X, and can usually be translated as "he brought X". In some parts of the paradigm, the two words remain more or less adjacent* - eg ya-jī'u bi- "he comes with"; in others, they are separated by an agreement morpheme - eg jā'-at bi- "she came with", ji'-nā bi- "we came with". In all modern dialects, the glottal stop is lost, and so are the final short vowels, which would regularly yield jā b(i)-, yijī b(i)-, jā-t b(i)-, etc. But in fact, this common construction was reanalysed as a single word, so you get forms along the lines of jāb, yijīb, jāb-it, jib-nā...

In Proto-Berber, as across most Berber languages, the word for "come" was something like as (perfect form y-usa, habitual yə-ttas, etc.) However, Proto-Berber also had a very productive system of "extensions", particles near the verb marking the direction in which the verb's action took place: towards (d) or away from (n) the speaker. Naturally, "come" normally featured the d extension. In many common forms, it was adjacent to the stem (eg y-usa d "he came", nə-ttas d "we come", etc.); in others, it was not (eg ad-d as-əγ "I will come", usa-n d "they came", etc.) In at least one variety - the dialect of the Beni Snous near Tlemcen, in western Algeria - this d was reinterpreted as part of the word "come"; so there (with voicing assimilation of s to z when next to d) you get forms like yusəd, nəttasəd, ad azd-əγ, uzd-ən.

* Strictly speaking, even in this one they're separated by a short vowel marking mood.

Wednesday, June 20, 2007

Is Omotic Afroasiatic?

Omotic, a small group of non-Cushitic, non-Semitic languages spoken in the highlands of Ethiopia, has always been the odd one out in Afroasiatic; by anyone's tree it is the first to have split off, and the noted Chadicist Paul Newman expressed scepticism about its membership in the family. I know little about Omotic, or Cushitic for that matter, but after reading a few sketch grammars in Omotic Language Studies , I found it very difficult to imagine these languages as Afro-Asiatic; with Berber or Hausa or Beja or Semitic the cognates are instantly visible, but none of the most familiar grammatical morphemes or lexical items seemed to be present. However, a paper I just came across by Rolf Theil is the first I've seen to present an argument against the hypothesis, and a pretty good one at that. There are parts I would question - for example, the suggestion that pronouns are unreliable (they are conspicuously unreliable in regions where extensive politeness systems have developed, like East and Southeast Asia, but I didn't think highland Ethiopia fell in that category) - but the overall argumentation seems good. In particular, the attempt to show that a roughly equal number of similarities can be observed between Omotic and families other than Afro-Asiatic is on the right track - if Omotic were to have more similarities with Afro-Asiatic than with any other family, then merely pointing out problems with some of those similarities would be inadequate. I'll be interested to see the reactions of people better acquainted with the family.

On another note, I passed my upgrade presentation yesterday - yay!

Sunday, May 27, 2007

Why people say silly things about historical linguistics

I recently realised that a lot of popular misconceptions about language evolution derive from uncritical use of the "family" metaphor. In families, a person has kids and then stays around, alongside the kids, for many years... they may live to see their great-grandchildren. The parent and the child may show a family resemblance, but will certainly be separate individuals. If you're told that languages come in "families", and "descend" from past languages, then it seems perfectly reasonable to imagine those ancestor languages lingering on alongside their descendants, and to imagine that the minor changes occurring daily within the language you speak are completely different from the sharp discontinuities that would have to occur for a new language to emerge.

But languages don't work that way at all: a language's "descendants" are (with rare exceptions) simply the various results of its own changes in the mouths of various communities. It's usually meaningless to talk about one living language being the "ancestor" of another one; in such cases, both are descendants of the same ancestor, even if (as infrequently happens) one has changed significantly less than the other. (Revived languages, like Sanskrit, are arguably an exception.) The same mistake is frequently made in popular understandings of biology, for the same reason; people imagine that chimpanzees (say) are humans' ancestors, when in reality the very fact that chimpanzees exist alongside humans proves that, while both species share a common ancestor, that ancestor was neither of them (or, looking at it another way, has equal right to be described as either of them.)

Learn Arabic for Beginner