Field of Science

The missing linking hypothesis

Science just published a paper on language evolution to much fanfare. The paper, by Quentin Atkinson, presents analysis suggesting that language was "invented" just one time in Africa. That language first appeared in Africa would be of little surprise, since that's where we evolved. That there was only one point at which it evolved is somewhat more controversial, and also trivially false if one includes sign languages, at least some of which have appeared de novo in modern times (and one could make a case for including spoken creoles in the list of de novo languages).

What still boggles my mind is the analysis that supports these conclusions. In many ways, it seems brilliant -- but I can't escape the feeling that there is something amiss with the argument. The problem, as we'll see, is a series of missing linking hypotheses.

The Data

The primary finding is that the further you go from Africa (very roughly following plausible migration paths), the fewer phonemes the local language has. Hawai'ian -- the language spoken farthest from our African point of origin -- has only 13 phonemes. Some languages in Africa have more than 100.

To support the claim that this demonstrates that language evolved in Africa, one must add some additional data and hypotheses. One datum is that languages spoken by more people have more phonemes. Atkinson argues that whenever a new population migrated away from the parent population, it would necessarily be a smaller group ... and thus their language would have fewer phonemes than the parent group. Keep this up and over time, you end up with just a few phonemes left.

Population genetics

This argument seems to derive a lot of its plausibility from well-known phenomena in population genetics. Whenever a new population branches off (migrates away), it will almost by definition have less genetic diversity than the mother population. And in fact Africa has greater genetic diversity than other continents.

Atkinson tries to apply the same reasoning to phonemes:
Where individuals copy phoneme distinctions made by the most proficient speakers (with some loss), small population size will reduce phoneme diversity. De Boer models the evolution of vowel inventories using a different approach, in which individuals copy any members of their group with some error, and finds the same population size effect.
I see the logic, but then phonemes aren't genes. When ten people leave home to start a new village, they can only take ten sets of genes with them, and even some of that diversity may be lost because of vagaries of reproduction. Those alleles, once gone, are not easily reconstructed.

As far as I can tell, to apply the same logic to phonemes we have to assume a fair percentage of children fail to learn all the phonemic contrasts in their native language. For some reason, this does not prevent them from communicating successfully. In a large population, the fact that many people lack this or that phonemic contrast doesn't matter, as on average, most people know any given phonemic contrast, and thus it is transmitted across the generations. When a small group leaves home, however, it's quite possible that by accident there will be a phonemic contrast that few (or none) of them use. The next generation is then unlikely to use that contrast.

This may be true, but I don't find its plausibility so overwhelming that I'm willing to accept it on face value. I'd actually like to see data showing that many or most speakers of a given language do not use all the phonemic contrasts (beyond the fact that of course some dialects are missing certain phonemes, as in the fact that Californians do not distinguish between cot and caught; dialectical variation probably cannot support Atkinson's argument, but I leave the proof to the reader ... or to the comment section).

Phonemes and Population Size

Atkinson reports being inspired by the relatively recent finding that languages spoken by more people have more phonemes. Interestingly, the authors of that paper note that "we do not have well-developed theoretical arguments to offer about why this should be." It seems to me that Atkinson's analyses depend crucially on the answer to this puzzle, though as I mentioned at the outset, I haven't been able to quite work out all the details yet.

Atkinson's analysis crucially depends on (among things) the following supposition: the current population size of any language community is roughly predicted by the number of branching points (migrations) since the original language (which arose somewhere on the order of 50,000 and 100,000 years ago). I'm still on the fence as to whether or not this is a preposterous claim or very reasonable.

It is certainly very easy to construct scenarios on which this supposition would be false. Civilizations expand and contract rapidly (consider that English was confined to only one part of Great Britain half a millennium ago, or that Celtic languages were spoken across Europe only 2,000 years ago). Relative population size today seems to be driven more by poverty, access to birth control and education, etc., than anything else. Atkinson only needs there to be a mid-sized correlation, but 50,000 years is a very, very long time.

Atkinson also needs it to be the case that the further from Africa a language is spoken, the more branching points there have been. The problem we have is that there is a lot of migration within already-settled areas (Indo-European expansion, Mandarin expansion, Bantu expansion, etc.). So we need it to be the case that most of the branching of language groups happened going into new, unsettled areas, and relatively little of it is a result of invading already-populated areas. That may be true, but consider that all of Africa, Europe, Asia and the Americas were settled by 10,000 years ago, which leaves a lot of time for language communities to move around.


Atkinson put together a very interesting dataset that needs to be explained. His explanation may well be the right one. However, his explanation requires making a number of conjectures for which he offers little support. They may all be true, but this is a dangerous way to make theories. It's a little like playing Six Degrees to Kevin Bacon where you are allowed to conjecture the existence of movies and co-stars. It should be obvious that with those rules, you can connect Kevin Bacon to anyone, including yourself. 


Torbjörn Larsson said...

So I haven't read the paper yet, but I find the reports interesting. It maps well to parts of genetic finds, especially on variation and its tendency away from Africa. In fact I think the article leaves that out in the discussion on whether population size is predicted by migrations, because if it happened with genomic populations it happened with many language populations simultaneously as far as I understand.

Actually one could take the successful prediction as evidence for that other means of language spread has had little effect most of the time, which makes sense as civilization is a recent invention. [I don't understand the problem with migration within already settled areas since it _wouldn't_ map to similar genetic results. But again I haven't read the paper so I don't know if it is an assumption there or here, and how it is motivated.]

The nice thing with this is that Atkinson can mount additional support for his theory. He doesn't need to, since assumptions are tested with the theory. Whether the theory remains with additional testing is another question. To use the Bacon analogy, who cares if Bacon is directly or indirectly connected with anyone if it is the phenomena of connection that is the question?

The obvious problem with the result is mentioned elsewhere: ~ 80 % of genetic variation is explained by migration/population size, only ~ 20 % of phoneme variation. So the mapping genetic-language history breaks down, and why that is seems unclear.

uzza said...

Thank you thank you thank you, for being the only person I've found who acknowledges that point about signed languages.

Good discussion of the paper here. I predict closer investigation will dissapate Atkinson's correlation is a cloud of details about how phonemes are counted in the WALS data and in geneeral.