Games with Words: On linguistic diversity

Showing posts with label On linguistic diversity. Show all posts

What makes interdisciplinary work difficult

Posted by GamesWithWords on Wednesday, August 21, 2013

I just read "When physicists do linguistics." Yes, I'm late to the party. In my defense, it only just appeared in my twitter feed. This article by Ben Zimmer describes work published earlier this year, in which a group of physicists applied the mathematics of gas expansion to vocabulary change. This paper was not well received. Among the experts discussed, Josef Fruehwald, a University of Pennsylvania graduate student, compares the physicists to Intro to Linguistics students (not favorably).

Part of the problem is that the physicists seem to have not understood the dataset they were working with and were in any case confused about what a word is, which is a problem if you are studying words! Influential linguist Mark Liberman wrote "The paper's quantitative results clearly will not hold for anything that a linguist, lexicographer, or psychologist would want to call 'words.'"

Zimmer concludes that

Tensions over [the paper] may really boil down to something simple: The need for better communication between disciplines that previously had little to do with each other. As new data models allow mathematicians and physicists to make their own contributions about language, scientific journals need to make sure that their work is on a firm footing by involving linguists in the review process. That way, culturomics can benefit from an older kind of scholarship -- namely, what linguists already know about humans shape words and words shape humans.

Beyond pointing out that linguists and other non-physicists don't already apply sophisticated mathematical models to language -- there are several entire fields that already do this work, such as computational linguistics and natural language processing -- I respectfully suggest that involving linguists at the review process is way too late. If the goal is to improve the quality of the science, bringing in linguists to point out that a project is wrong-headed after the project is already completed doesn't really do anyone much good. I guess it's good not to publish something that is wrong, but it would be even better to publish something that is right. For that, you need to make sure you are doing the right project to begin with.

This brings me to the difficulty with interdisciplinary research. The typical newly-minted professor -- that is, someone just starting to do research on his/her own without regular guidance from a mentor/advisor -- has studied that field for several years as an undergraduate, 5+ years as a graduate student, and several more years as a post-doc. In fact, in some fields even newly-minted professors aren't considered ready to release into the wild and are still working with a mentor. What this tells me is that it takes as much as 10 years of training and guidance before you are ready to be fully on your own. (This will vary somewhat across disciplines.)

Now maybe someone who has already mastered one scientific field can master the second one more quickly. I'm frankly not sure that's true, but it is an empirical question. But it seems very unlikely that anyone, no matter how smart nor how well trained in their first field, is ready to tackle big questions in a new field without at least a few years of training and guidance from an experienced researcher in that field.

This is not a happy conclusion. I'm getting a taste of this now, as I cross-train in computational modeling (my background is pure experimental). It is not fun to go from being regarded as an expert in your field to suddenly being the least knowledgeable person in your laboratory. (After a year of training, it's possible I'm finally a more competent computational modeler than at least the incoming graduate students, though it's a tough call -- they, at least, typically have several years of relevant undergraduate coursework.) And I'm not even moving disciplines, just sub-disciplines within cognitive science!

So it's not surprising that some choose the "shortcut" of reading a few papers, diving in, and hoping for the best, especially since the demands of the career mean that nobody really has time to take a few years off to learn a new discipline. But it's not clear that this is a particularly effective strategy. All the best interdisciplinary work I have seen -- or been involved in -- involved an interdisciplinary team of researchers. This makes sense. It's hard enough to be an expert in one field. Why try to be an expert in two fields when you could just collaborate with someone who has already done the hard work of becoming an expert in that discipline? Just sayin'.

The missing linking hypothesis

Posted by GamesWithWords on Friday, April 15, 2011

Science just published a paper on language evolution to much fanfare. The paper, by Quentin Atkinson, presents analysis suggesting that language was "invented" just one time in Africa. That language first appeared in Africa would be of little surprise, since that's where we evolved. That there was only one point at which it evolved is somewhat more controversial, and also trivially false if one includes sign languages, at least some of which have appeared de novo in modern times (and one could make a case for including spoken creoles in the list of de novo languages).

What still boggles my mind is the analysis that supports these conclusions. In many ways, it seems brilliant -- but I can't escape the feeling that there is something amiss with the argument. The problem, as we'll see, is a series of missing linking hypotheses.

The Data

The primary finding is that the further you go from Africa (very roughly following plausible migration paths), the fewer phonemes the local language has. Hawai'ian -- the language spoken farthest from our African point of origin -- has only 13 phonemes. Some languages in Africa have more than 100.

To support the claim that this demonstrates that language evolved in Africa, one must add some additional data and hypotheses. One datum is that languages spoken by more people have more phonemes. Atkinson argues that whenever a new population migrated away from the parent population, it would necessarily be a smaller group ... and thus their language would have fewer phonemes than the parent group. Keep this up and over time, you end up with just a few phonemes left.

Population genetics

This argument seems to derive a lot of its plausibility from well-known phenomena in population genetics. Whenever a new population branches off (migrates away), it will almost by definition have less genetic diversity than the mother population. And in fact Africa has greater genetic diversity than other continents.

Atkinson tries to apply the same reasoning to phonemes:

Where individuals copy phoneme distinctions made by the most proficient speakers (with some loss), small population size will reduce phoneme diversity. De Boer models the evolution of vowel inventories using a different approach, in which individuals copy any members of their group with some error, and finds the same population size effect.

I see the logic, but then phonemes aren't genes. When ten people leave home to start a new village, they can only take ten sets of genes with them, and even some of that diversity may be lost because of vagaries of reproduction. Those alleles, once gone, are not easily reconstructed.

As far as I can tell, to apply the same logic to phonemes we have to assume a fair percentage of children fail to learn all the phonemic contrasts in their native language. For some reason, this does not prevent them from communicating successfully. In a large population, the fact that many people lack this or that phonemic contrast doesn't matter, as on average, most people know any given phonemic contrast, and thus it is transmitted across the generations. When a small group leaves home, however, it's quite possible that by accident there will be a phonemic contrast that few (or none) of them use. The next generation is then unlikely to use that contrast.

This may be true, but I don't find its plausibility so overwhelming that I'm willing to accept it on face value. I'd actually like to see data showing that many or most speakers of a given language do not use all the phonemic contrasts (beyond the fact that of course some dialects are missing certain phonemes, as in the fact that Californians do not distinguish between cot and caught; dialectical variation probably cannot support Atkinson's argument, but I leave the proof to the reader ... or to the comment section).

Phonemes and Population Size

Atkinson reports being inspired by the relatively recent finding that languages spoken by more people have more phonemes. Interestingly, the authors of that paper note that "we do not have well-developed theoretical arguments to offer about why this should be." It seems to me that Atkinson's analyses depend crucially on the answer to this puzzle, though as I mentioned at the outset, I haven't been able to quite work out all the details yet.

Atkinson's analysis crucially depends on (among things) the following supposition: the current population size of any language community is roughly predicted by the number of branching points (migrations) since the original language (which arose somewhere on the order of 50,000 and 100,000 years ago). I'm still on the fence as to whether or not this is a preposterous claim or very reasonable.

It is certainly very easy to construct scenarios on which this supposition would be false. Civilizations expand and contract rapidly (consider that English was confined to only one part of Great Britain half a millennium ago, or that Celtic languages were spoken across Europe only 2,000 years ago). Relative population size today seems to be driven more by poverty, access to birth control and education, etc., than anything else. Atkinson only needs there to be a mid-sized correlation, but 50,000 years is a very, very long time.

Atkinson also needs it to be the case that the further from Africa a language is spoken, the more branching points there have been. The problem we have is that there is a lot of migration within already-settled areas (Indo-European expansion, Mandarin expansion, Bantu expansion, etc.). So we need it to be the case that most of the branching of language groups happened going into new, unsettled areas, and relatively little of it is a result of invading already-populated areas. That may be true, but consider that all of Africa, Europe, Asia and the Americas were settled by 10,000 years ago, which leaves a lot of time for language communities to move around.

Conclusion

Atkinson put together a very interesting dataset that needs to be explained. His explanation may well be the right one. However, his explanation requires making a number of conjectures for which he offers little support. They may all be true, but this is a dangerous way to make theories. It's a little like playing Six Degrees to Kevin Bacon where you are allowed to conjecture the existence of movies and co-stars. It should be obvious that with those rules, you can connect Kevin Bacon to anyone, including yourself.

Missing Words

Posted by josh on Thursday, February 10, 2011

My dictionary lists several Chinese words for disdain, but none for discourage. The government in Orwell's 1984 would have loved this, as they -- along with many contemporary writers (I'm talking about you, Bill Bryson) -- believed that you don't have a word for something you can't think about it. I guess China has no need for the motivational speaker industry.

You can't be discouraged if you don't have a word for it.

Unfortunately for the government of Oceania, there's very little evidence this is true. The availability of certain words in a language may have effects on memory or speeded recognition, but probably does nothing so drastic as making certain thoughts inaccessible. I think examples like the one above make it clear just how unlikely the hypothesis was to be true to begin with.

-----
photo credit here.

Qing Wen!

Posted by GamesWithWords on Thursday, December 30, 2010

In the process of encouraging more Americans to study Spanish rather than Mandarin, Nicholas Kristoff notes that in Mandarin

there are thousands of characters to memorize as well as the landmines of any tonal language.

How true! How true! Kristoff shortly proves the latter point in more ways than one:

The standard way to ask somebody a question in Chinese is “qing wen,” with the “wen” in a falling tone. That means roughly: May I ask something? But ask the same “qing wen” with the “wen” first falling and then rising, and it means roughly: May I have a kiss?

Just one possible reaction if you use the wrong tone.

Kristoff is right, so long as you don't mind sounding like a speech synthesizer. The classic description of third tone is a falling tone followed by a rising tone, but in practice it is relatively rare to pronounce the second half (the rising tone), particularly in fluent speech (in Taiwan, anyway; China has a lot of regional variation in Mandarin, so I don't know whether this holds everywhere). Figuring out when to pronounce the full tone and when not to is just one of many issues L2 Mandarin speakers run into.

Actually, third tone is worse than I just suggested. Qing wen is actually a good example, because the qing is also in third tone. When there are two third tones in a row -- as there are in the qing3 wen3 that means "may I ask you a question?" (I'm writing in the tones with numbers here) -- the first one is pronounced as if it were second tone (start low and rise high). So even though qing technically doesn't change, its pronunciation depends on which wen you are using.

If you have three or more third tones in a row (e.g., ni3 you3 hao3 gou3 gou3 ma0?), deciding which syllables will be pronounced as if they were second tone is a complicated issue. I'd explain it to you, but I don't actually know myself. I've been told you actually have some flexibility in what you do, but I'm not sure that wasn't just another way of saying, "Sorry, I can't really explain it to you."

Dothraki -- a response

Posted by GamesWithWords on Thursday, June 03, 2010

The Language Creation Society has officially responded to my open letter requesting that they embed some useful experiments in Dothraki, a language they are creating for a new HBO show. You can read the response at Scientific American.

This formal response follows a series of informal emails between myself and both David Peterson (the author of the response) and Sai Emrys (the LCS president). It was a fun conversation, and while they're not taking me up on my suggestion -- at least not for this language -- I did learn a great deal from them, some of which makes it into their letter, which I recommend reading.

Why Is Nobody Studying Klingon?

Posted by josh on Tuesday, May 11, 2010

Doing research for the recent Scientific American Mind article, I found out that Klingon uses the incredibly rare object-verb-subject (OVS) word order. Even though some languages (like Russian) allow relatively free word-order, all languages seem to have a preferred order. There are 6 possible. The most common are SVO (English), SOV (Japanese), VSO (Classical Arabic). The 3 orders that put the object before the subject are relatively rare, with OVS nearly non-existent. It does appear occasionally in poetry or other marked uses (The drink drank I), and is claimed to be the dominant word order in at least two extremely rare languages: Guarijio and Hixkaryana. Given the degree of debate over how to correctly characterize syntax in well-studied languages like English, I'm always maintain some skepticism about rare, poorly-studied languages (and the sad truth is that all languages are poorly-studied when compared to English).

In any case, if one wanted to study the acquisition of Guarijio or Hixkaryana, one would need a decent travel budget and some infrastructure. Klingon is spoken closer to home. Yet I couldn't find any papers in Google Scholar looking at the acquisition of Klingon, even from a sociological perspective. This seems under-studied.

Speaking Chinese

Posted by josh on Thursday, August 13, 2009

People often talk about speaking 'Chinese,' as if there were a single language called 'Chinese.' There are a number of related Chinese languages, much as there are a number of related Romance languages.

It turns out the situation is worse than we thought, though. Linguists have been discovering new Chinese languages.

How the Presidential Campaign Changed the English Language

Posted by josh on Thursday, November 06, 2008

Languages change over time, which is why you shouldn't take seriously any claims about this language being older than the other, or vice versa. A language is only old in the same sense that a farmer can say, "I've had this axe for years. I've only changed the handle twice and the head three times."

Language change is probably slowed these days by stasis-inducing factors like books. However, rapid communication means that new phrases or ways of speaking can be disseminated with lightning speed. Here is an interesting article about the effect McCain & Palin's drill, baby, drill has had on the English language.

A Bush-administration flunkee's unfortunate statement that reporters -- but not members of the Bush administration -- are members of "what we call the reality-based community" led to an interesting shift in the way Progressives speak. The compound adjective "reality-based" has become part inside joke, and part simply a new word. I suspect "real America" will similarly entrench itself in the English language.

Dead languages

Posted by josh on Tuesday, October 07, 2008

Latin is dead, as dead as dead can be.
First it killed the Romans, and now it's killing me.

But not kids in Westchester County, it would seem.

The linked article also notes that the number of students taking the National Latin Exam has risen steadily in the last few years. As somebody who studied Latin in high school and who loves languages generally, that seems like a good thing. But I do have to wonder: why Latin?

What is the first language?

Posted by josh on Wednesday, April 09, 2008

Linguists debate whether all languages are descended from a common ancestor. This can't be completely true, since many sign languages have been invented out of whole cloth in modern time (Nicaraguan sign is a famous example), as was, to a meaningful extent, Hawaiian Pidgin.

However, students of history know from the ancient Greek historian Herodotus that Phrygian is the first language. According to his writings, an Egyptian king by the name of Psammetichus ordered that two children "of the ordinary sort" be raised in an isolated cabin without exposure to language. At the age of two or so, the children began to speak Phrygian, which was taken as proof that Phrygian, not Egyptian, is the world's earliest language.

This study is a great example of why experiments need to be replicated before they are taken too seriously.

Word meaning as a window into thought

Posted by josh on Thursday, February 07, 2008

Benjamin Whorf has perhaps the best name recognition in psycholinguistics, being known for the Whorfian Hypothesis: the idea that the particular language you learn constrains the way you think about the world.

This hypothesis has made its way into popular culture (or, perhaps it predated Whorf). Many essays -- and sometimes large sections of books -- make a big deal of etymology. That is, the origin of a word is supposed to tell us something about culture. A popular example is the Mandarin word for "China" means, literally, "Center Country." This is supposed to tell us something about how the Chinese view their place in the world.

Maybe it does, maybe it doesn't. But certainly in some cases etymology tells us nothing. Here's a quote from "Formal Semantics" by Genarro Chierchia:

To make this point more vividly, take the word money. An important word indeed; where does it come from? What does its history reveal about the true meaning of money? It comes from Latin moneta, the past participle feminine of the verb moneo 'to warn/to advice.' Moneta was one of the canonical attributes of the Roman goddess Juno; Juno moneta is 'the one who advises.' What has Juno to do with money? Is it perhaps that her capacity to advise extends to finances? No. It so happens that in ancient Rome, the mint was right next to the temple of Juno. So people metonymically transferred Juno's attribute to what was coming out of the mint. A fascinating historical fact that tells us something as to how word meanings may evolve; but it revelas no deep link between money and the capacity to advise.

Back to Chinese. Another good example is the word for turkey: huoji. Literally, it means "fire chicken." Anyone who wants to make a story about how that explains the Chinese psyche is welcome to give it a shot.

Field of Science