Assistant Village Idiot: Signal to Noise

Tuesday, August 25, 2020

Signal to Noise

In a hurry. Editing later. If you notice something let me know.

I have made reference to historical linguistics and the deep connections between languages many times before. Many linguists do not consider such connections to be established, and indeed impossible to ever establish. When I started following this, the consensus was that any time depth beyond 5,000 years was not credible. Languages change too much over that time to have recognizable similarities. One thousand years is considered the (highly) variable estimate of how long it takes for a language to no longer be understandable without helps. For example modern English-speakers can recognize the similarity immediately and understand Old English with a bit of work. We can see similarities in West Germanic from a thousand years before that if they are pointed out to us, but actual comprehension takes training. To understand the ancestral languages before that, back into Proto-Germanic and Proto-Indo-European, is the work of specialists. Any deeper connections, many linguists said – and still say – are hypothetical. Over the last few decades that has stretched to 6-8,000 years, and some will reluctantly even say 10,000, but those are the upper limits for that approach to historical linguistics.

Some Russian linguists believed they could detect similarities at greater depth - significantly, they had access to a great many language families we did not - and associated those language families into larger collections. In America, Joseph Greenberg at Stanford taught much the same in the 60s and 70’s by using a technique based on vocabulary that most linguists still reject. The controversy has never gone away, but the possibility of seeing relationships between languages and language families at greater time depth has slowly gained favor. In particular, Greenberg categorized the languages of the New World into three groups: a most-recent Eskimo-Aleut migration; an earlier Athabaskan (or Na-Dene) arrival; and everything else in North and South America, from the earliest migration, as Amerind. That would mean connecting those languages as related at a time-depth of 15,000 years, which is considered out of reach for traditional linguistic techniques. Even the early genetic work by Cavelli-Sforza supported Greenberg’s theory that these languages all came from a single Beringian migration, and later genetic findings have nailed that external supporting evidence to the ground. Yet the existence of Amerind as a single family is almost universally rejected. By linguists, anyway. Everyone else is becoming more confident Greenberg had it right.

Why then do historical linguists persist in saying the connection is not established? I am not entirely sympathetic to their arguments, but it isn’t as if they don’t have good points. First, the evidence from other fields is from other fields. They don’t know if it is all going to change tomorrow, or at least start developing important correctives. They may be interested in such findings and use them to inform their own work, but the idea that we do what we do and tend our own garden is not crazy. This may all turn out to be the case, but it’s not linguistics. We’re telling you what the linguistics shows. And similarities at time-depths of even 15,000 years are not detectable, never mind the 40,000 years some people are talking about.

Next, the relationship between language and genetics is uncertain, and one need look no farther than an American schoolroom to see this. They are all speaking English, but one glance will tell you they do not share near genetics. In addition to such wholesale takeovers of one group by another repeatedly in the past, there are also borrowings of words and complicated interbreeding arrangements that can send a language off in a new direction. Lots of us more than just possible answers to questions, and if we can’t have proof, we sometimes insist on high probability. Faced with the knowledge of genetic connection between Algonquians and Seminoles, a linguist might say “Interesting, but so what, really. It could be anything.”

Vocabulary changes much more rapidly, and for much more arbitrary reasons than the deeper structures of language such as sound changes, or word order and other syntax. Linguists are much more suspicious of word similarity as evidence of language connection for that reason. Even in languages we know well, we think we see patterns and derivations that aren’t there. We used to believe that the English word girl which once meant either a male or female child must be related to Swedish gurre, “small child” especially as there are other similar terms in Norwegian and Low German dialects. Yet this is increasingly rejected. In addition to what is mentioned at the link, there would also have been a sound change from the nearest possible ancestral language moving into English, and we would be pronouncing it “yirl.” How much more in languages that are barely attested, only written down over the last hundred years, and that by non-native speakers? It’s a solid point.

Greenberg’s Multilateral Comparison involves making comparisons not only at the level of single languages, but of entire families. That is, If a word from a language in family A is similar to a word (including meaning) in any member of family B, once can posit a relationship if you get enough of them. One can see why this would attract objection. Bringing in the vocabularies of languages already known to be distinct, even if related, is introducing even more noise.

It is a signal-to-noise problem. There’s more signal to noise form syntax than vocabulary. And if you broaden the definition of vocabulary even further it gets worse. Old style linguists are correct in this information. Greenberg’s attempt to bypass this noise problem by sheer volume of information, insisting that if you get enough volume you can start to detect signal, doesn’t impress them. (The same quote from Lyle Campbell always seems to be used. Isn’t there anyone else quotable on the subject? Doesn't that in itself tell us something?)

It impresses me. We are increasingly able to filter out noise and detect signal in everything else these days, I don’t see why language should be different. Greenberg’s hypothesis turns out to be correct, and “it’s nae use sayin’ pigs conner fly when ye see ‘em sproutin’ wings.” I will tell you what is most likely to happen. Cross-disciplinary studies are now becoming the norm, and the geneticist and archaeologist speaking together and coming to a common conclusion about a relatedness between peoples based on tools and DNA are not going to be interested in a strict linguist saying “I regard this as merely possible. I can’t sign on.” If he could rule something out as impossible, or propose an alternative explanation of relationship, perhaps bringing in yet a third group to explain, they would be fine with that. But the linguist has backed himself into a corner there, unable to disprove, and also not very certain about any of the possible alternatives, because he is demanding a high level of certainty.

The purpose of a field of study in its origin is to find answers to questions. Who lived here? Why don’t these chemicals mix when I shake them? Why do people in groups act like this? Allorganisations move toward perpetuating themselves, and eventually the real talent goes elsewhere.

5 comments:

Donna B. said...: "For example modern English-speakers can recognize the similarity immediately and understand Old English with a bit of work."

Yet... modern English speakers cannot recognize and understand the language of modern Glasgow taxi drivers.; 4:23 PM
james said...: How fast does a language change in isolation? Is it relatively constant?; 4:25 PM
Assistant Village Idiot said...: It is hard to know, but believed to be faster. Once a language is written down it changes more slowly, so that 8th-graders in Iceland can read the 13th C sagas, though with difficulty. We don't have good records for languages that aren't written down, mostly reports from missionaries who were not professional linguists. Of things written down by professionals, it looks like the changes can be quick. I vaguely recall reading about a recording made in the 1940's of a remote Nepalese language that current speakers have minor trouble with already, as words have gone out of use.

Few languages are entirely isolated. Groups retain some contact with other tribes, and as those are usually from highly related language branches, common words get reinforced and kept. Reportedly, the Caucasus and Papua New Guinea are the best laboratories for studying these things, because both have many remote tribes with no outside contact and very little even close contact. This is notably because they either flee from or kill intruders. Noble savage and all that.; 4:45 PM
james said...: Crud. I wasn't clear. I meant to ask if the rate of change was constant between cultures/language families. But from what you wrote it sounds like that's not known.
I was wondering if Hawaii would be a useful test case, but I gather there's quite a bit of dispute about when it was settled and how often.
Rubber rulers...; 5:19 PM
Texan99 said...: OT, but I'm just starting Irving Finkel's "The Ark Before Noah," which in the very first chapter begins to paint such a charming portrait of an ancient-languages geek that I feel sure you would enjoy it. He's going to go on to examine how a catastrophic sudden filling of the Black Sea might be the origin of Indo-European world-destroying flood traditions.

What a pleasure to run across well-written popular science. You'd want this guy for a dinner guest.; 3:57 PM