Sunday, October 5, 2008

Translating from linguists' English to normal English

Machine translation between languages is hard, obviously. There are all sorts of reasons why just looking words up and constructing syntactic trees and changing orders appropriately isn't enough to produce a good output - mainly, the fact that to disambiguate ambiguities you often need real world knowledge, and different vocabularies are not always organised in the same way. How much that matters is really emphasised by thinking about a slightly different problem: translation from a technical vocabulary to a non-technical one within the same language.

Take the following sentences, pulled at random from a grammar on my shelf (Stroomer's Grammar of Boraana Oromo):
"Nouns ending in -ni (mostly -aani) have ultimate or penultimate stress in free variation."

"Verbs with the verb extension -ad'd'-, -at- have an AFF.IMPER.sg: -ád'd'i, -ád'd'u and a NEG.IMPER.sg: -atín(n)i, see 10.10." (p. 72)

If you are, say, a foreign worker about to be posted to northern Kenya, or a second-generation emigrant Oromo planning to go back and visit, you may well want to try and learn some Oromo from this book. But the odds are you will not know what either of these English sentences means, and that applies to quite a lot of the book.

How could you translate these sentences into terms a wider audience would understand? If you can assume a certain amount of basic knowledge (traditional parts of speech, consonants and vowels) then that makes things easier:
"Nouns ending in -ni (mostly -aani) get stressed on the last or second-to-last vowel, it doesn't matter which."

"Verbs with -ad'd'-, -at- added at the end have an imperative singular: -ád'd'i, -ád'd'u and a negative imperative singular: -atín(n)i, see 10.10."
Realistically, you can't assume that level of knowledge, certainly not in Britain at any rate (I still can't believe that what little grammar gets taught in schools here only ever seems to get taught in foreign language classes, not in English ones; that no doubt explains part of the country's comparatively low foreign language skills.) So what does that leave you with? Something like:
"When you say a word that refers to a person, place, or thing* and ends in -ni (mostly -aani), you put the emphasis at the end or just before the end, it doesn't matter which."

"If you have a word that means doing something* that has -ad'd'-, -at- added at the end, then to order one person to do that you add -ád'd'i, -ád'd'u, and to order them not to do that you add -atín(n)i, see 10.10."
(*Yes, I know that syntactic tests like whether they can be the object of a preposition yield more accurate definitions, but in practice these are a good first approximation, and the former does work even on gerunds: "Killing is a bad thing", so "killing" is a noun, but *"Kill is a bad thing", so "kill" isn't.)

Could this be done algorithmically? A simple substitution table would certainly not be enough. Just try it with any set of definitions you can think of:
"Words referring to a person, place, or thing ending in -ni (mostly -aani) have final or pre-final emphasis such that it doesn't matter which."

"Words that mean doing something with the words that mean doing something extension -ad'd'-, -at- have an agreeing order-giving one-entity: -ád'd'i, -ád'd'u and a denying order-giving one-entity: -atín(n)i, see 10.10." (p. 72)
Not terribly helpful, I think you'll agree... To come up with something a little more helpful (and I'm sure my renditions could be improved on) we had to change the whole structure of the sentence. Even then, at some point it's probably going to be more effective to just teach the person the grammatical notions and let them go forward from there than to keep giving brief explanations of the same notion over and over again.

The problem is certainly not unique to linguistics. Medicine, law, ecology - most fields have technical vocabularies that pose an obstacle to non-specialists, who will often have good reason to be interested in trying to make sense of them. Is there any role for algorithms in this (apart from obvious things like hyperlinking technical terms to dictionary entries)? It's well outside my usual field, but it would be interesting to hear of any attempts.

No comments:

Post a Comment