Field of Science

Bad Evolutionary Arguments

The introductory psychology course I teach for is very heavy on evolutionary psychology. The danger with evolutionary explanations is that it's pretty easy to come up with bad ones. Here's the best illustration I've seen, from Saturday Morning Breakfast Cereal:


How do you tell a good evolutionary argument from a bad one? It's hard to test them with experiments, but that doesn't mean you can't get data. Nice supporting evidence would be finding another species that does the same thing. This hypothesis makes the clear -- and almost certainly false -- prediction that people are likely to adopt babies that fly in out of the blue. You would want to show that whatever reproductive advantage comes from having your genes spread widely (adopted children themselves have more children?) is not overwhelmed by the disadvantages of not being raised by your biological parents (there is data showing that, all else equal, step-parents invest lest in step-children than biological parents invest in their biological children. I expect this generalizes to adoptive parents, but I'm not sure; it might be confounded in modern day by the rigorous screening of adoptive parents).

Etc. We try to teach our students to critically evaluate evolutionary hypotheses. Hopefully it has taken.


Citizen Science Project: Likely Events


VerbCorner was our first step towards opening up the rest of the process. I have just opened up a new good to segment of the website called “Experiment Creator”, which is our second endeavor.

Experiment Creator


One of the most important parts of language experiments is choosing the stimuli. For many types of research, such as in many low level or mid-level vision projects, the experimenter has free reign to design essentially what ever stimuli they like. Language researchers are constrained by the fact that someone suggest other words don't, and each word that has the properties you want may also have other properties that you don't want along for the ride. For instance, you might want to compare nouns and verbs, which don't just differ in terms of part of speech but also frequency (there are many very low-frequency nouns) and length (in some languages, nouns will be systematically longer than verbs; in other languages, it will be the reverse).

Typically, we have to run one or more “norming” experiments to choose stimuli that are controlled for various nuisance factors. These are not really experiments. There is no hypothesis. The purpose of the experiment is indirect (it's an experiment to create another experiment). So I usually do not post them at gameswithwords.org, which recruits people who want to participate in experiments.

The new Experiment Creator project changes this. The tasks posted there will be meta-experiments, used to choose stimuli for other experiments. I just posted the first one, Likely Events.

Likely Events


One of the big discoveries about language in the last few decades is that when we are listening to someone talk or reading a passage, we are actively predicting what will come next. If you hear “John needed money, so he went to the…” you probably expect the next word to be “ATM," not “hibernate.” There are two reasons: 1) "the" is usually followed by a noun, not a verb, and 2) "hibernate" is a relatively rare word.

Much of this research has focused on word frequency and what words follow what other words. We are developing several projects to look more carefully at predictions based not on what word follows what word but on what event is likely to follow what event. In general, "the street" is a more common sequence of words than "the ATM" and "street" is more common than "ATM", but you probably didn't think that the example sentence above was likely to end with "street" for a simple reason: That's not (usually) where you go when you need money.

To do this research, we need to have sequences of events and vary how likely it is that the one event would follow the other, as well as how likely each event is to happen on its own. And we need many, many such sequences. If you would like to help us out, you can do so here.

On the theory that the people interested in these projects will be more committed, Likely Events takes a bit longer than our typical project (in order to make up for the smaller number of volunteers). I expect participation will take on the order of half an hour. We will see how this goes and how many people are interested. Feedback is welcome.


VerbCorner: A Citizen Science project to find out what verbs mean

Earlier this week, I blogged about our new VerbCorner project. At the end, I promised that there would be more info forthcoming about why we are doing this project, about its aims and expected outcomes, why it's necessary, etc. Here's the first installment in that series.

Computers and language

I just dictated the following note to Siri
Many of our best computer systems treat words as essentially meaningless symbols that need to be moved around.
Here's what she wrote
Many of our best computer system street words is essentially meaningless symbols that need to be moved around.
I rest my case.

The problem of meaning.

I don't know for sure how Siri works, but her mistake is emblematic of how much language software works. Computer systems treat and Computer system street sound approximately the same, but that's not something most humans would notice because the first interpretation makes sense and the second one doesn't. 

Decades of research shows that human language comprehension is heavily guided by plausibility: when there are two possible interpretations of what you just heard, go for the one that makes sense. This happens speech recognition like in the example above, and it plays a key role in understanding ambiguous words. If you want to throw Google Translate for a look, give it the following:
John was already in his swimsuit as we reached the watering hole. "I hope the tire swing is still there," John said as he headed to the bank.
Although the most plausible interpretation of bank here is side of a river, Google Translate will translate it into the word for "financial institution" in whatever language you are translating into, because that's the most common meaning of the English work bank.

So what's the problem?

I assume that this limitation is not lost on the people at Google or at Apple. And, in fact, there are computer systems that try to incorporate meaning. The problem there is not so much the computer science as the linguistic science.** Dictionaries notwithstanding, scientists really do not know very much about what words mean, and it is hard to program the computer to know what the word means when you actually do not know.

(Dictionaries are useful, but as an exercise, pick* definition from a dictionary and come up with a counterexample. It is not hard.)

One of the limitations is scope. Language is huge. There are a lot of words. So scientists will work on the meanings of a small number of words. This is helpful, but a computer that only knows a few words is pretty limited. We want to know the meanings of all words.

Solving the problem

We've launched a new section of the website, VerbCorner. There, you can answer questions about what verbs mean. Rather than try to work out the meaning of a word all at once, we have broken up the problem in a series of different questions, each of which tries to pinpoint a specific component of meaning. Of course, there are many nuances to meaning, but research has shown that certain aspects are more important that others, and we will be focusing on those.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

----
*Dragon Dictate originally transcribed this as "pickled", which I did not catch on proofreading. More evidence that we need computer programs that understand what words mean.
**Dragon Dictate make spaghetti out of this sentence, too.

Citizen Science at GamesWithWords.org: The VerbCorner Project

What do verbs mean? We'd like to know. For that reason, we just launched VerbCorner, a massive, crowd-sourced investigation into the meanings of verbs. 

Why do we need this project? Why not just look up what verbs mean in a dictionary? While dictionaries are enormously useful (I think I own something like 15), they are far from perfect. For one thing, it's usually very easy to find counter-examples even for what seem like straight-forward definitions. Take the following:
Bachelor: An unmarried man.
So is the Pope a bachelor? Is Neil Patrick Harris? How about a married man from a country in which men are allowed multiple wives?

At VerbCorner, rather than trying to work out the whole definition at once, we have broken meaning into many different components. At the site, you will find several different tasks. In each task, you will try to determine whether a particular verb has a particular component of meaning. 

If you are interested in what words mean and would like to help with this project, sign up for an account at http://gameswithwords.org/VerbCorner/. Participation can be anonymous, but we are happy to recognize significant contributions from anyone who wishes it.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

A Critical Period for Learning Language?

If you bring adults and children into the lab and try teaching them a new language, adults will learn much more of the language much more rapidly than the children. This is odd, because probably one of the most famous facts about learning languages -- something known by just about everyone whether you are a scientist who studies language or not -- is that adults have a lot less success at learning language than children. So whatever it is that children do better, it's something that operates on a timescale too slow to see in the lab. 

This makes studying the differences between adult and child language learners tricky, and a lot less is known that we'd like. Even the shape of the change in language learning ability is not well-known: is the drop-off in language learning ability gradual, or is there a sudden plummet at a particular age? Many researchers favor the latter possibility, but it has been hard to demonstrate simply because of the problem of collecting data. The perhaps most comprehensive study comes from Kenji Hakuta, Ellen Bialystok and Edward Wiley, who used U.S.A. Census data from 2,016,317 Spanish-speaking immigrants and 324,444 Chinese-speaking* immigrants, to study English proficiency as a function of when the person began learning the language. 

Their graph shows a very gradual decline in English proficiency as a function of when the person moved to the U.S.



Unfortunately, the measure of English proficiency wasn't very sophisticated. The Census simply asks people to say how well they speak English: "not at all", "not well", "well", "very well", and "speak only English". This is better than nothing, and the authors show that it correlates with a more sophisticated test of English proficiency, but it's possible that the reason the lines in the graphs look so smooth is that this five-point scale is simply too coarse to show anything more. The measure also collapses over vocabulary, grammar, accent, etc., and we know that these behave differently (your ability to learn a native-like accent goes first).

A New Test

This was something we had in mind when devising The Vocab Quiz. If we get enough non-native Speakers of English, we could track English proficiency as a function of age ... at least as measured by vocabulary (we also have a grammar test in the works, but that's more difficult to put together and so may take us a while yet). I don't think we'll get two million participants, but even just a few thousand would be enough. If English is your second (or third or fourth, etc.) language, please participate. In addition to helping us with our research and helping advance the science of language in general, you will also be able to see how your vocabulary compares with the typical native English speaker who participates in the experiment.

--------
Hakuta, K., Bialystok, E., & Wiley, E. (2003). Critical Evidence: A Test of the Critical-Period Hypothesis for Second-Language Acquisition Psychological Science, 14 (1), 31-38 DOI: 10.1111/1467-9280.01415



*Yes, I know: Chinese is a family of languages, not a single language. But the paper does not report a by-language breakdown for this group.

Living in an Imperfect World: Psycholinguistics Edition

You, sir, have tasted two whole worms. You have hissed all my mystery lectures and been caught fighting a liar in the quad. You will leave Oxford by the next town drain. -- Reverend Spooner.

There is an old tension in psycholinguistic (or linguistic) theory, which boils down to two ways of looking at language comprehension. When somebody says something to you, what do you do with that linguistic input? Is your goal to decode the sentence and figure out what the sentence means, or do you try to figure out what message the speaker intended to convey? The tension comes in because presumably we do a bit of both.

Suppose a young child says, "Look! A doggy!" while pointing to a cat. Most people will agree that technically, the child's sentence is about a dog. But most of can still work out that probably the child meant to talk about the cat; she used the word doggy either due to lack of vocabulary, confusion about the distinction between dogs and cats, or a simple speech error. Similarly, if your friend says at 7pm, "Let's go have lunch," technically your friend is suggesting having the midday meal, but probably you charitably assume he is just very hungry and so made a mistake in saying "lunch" instead of "dinner".

For a variety of reasons, linguistics and psycholinguistics have focused mostly on decoding sentences rather than intended meanings. This is important work about an important problem, but -- as we saw above -- it's only half the story. PNAS just published a paper by Gibson, Bergen, and Piantadosi that addresses the second half. Gibson and Bergen are at M.I.T., and Piantadosi recently graduated from M.I.T., and like much of the work coming out of Eastern Cambridge lately, they take a Bayesian perspective on the problem, and point out that the probability that the speaker intended to convey a particular message m given that they said sentence s is proportional to the prior probability that the speaker might want to convey m times the probability that they would say sentence s when intending to convey m.

This ends up accounting for the phenomenon brought up in Paragraph #2: If the literal meaning of the speaker's sentence isn't very likely to be what they intended to say ("Let's go have lunch", spoken at 7pj), but there is some other sentence that contains roughly the same words but has a more plausible meaning ("Let's go have dinner"), then you should infer that the intended message is the latter one and that the speaker made an error.

So far, this is not much more than a restatement of our intuitive theory in Paragraph #2. But a Gibson, Bergen and Piantadosi point out that a few non-trivial predictions come out of this. One is that you should assume that deletions (dropping a word) are more likely than insertions (adding a word). The reason is that there are only so many words that can be dropped from a particular sentence, so even if the probability of accidentally dropping a word is low, the probability of accidentally dropped a particular word isn't all that much lower. So if the intended sentence was "The ball was kicked by the girl", and the speaker accidentally dropped two words, the probability that the speaker happened to drop "was" and "by", resulting in the grammatical but unlikely sentence "The ball kicked the girl" is not so bad. However, suppose the intended sentence was "The girl kicked the ball", what are the chances the speaker accidentally adds "was" and "by", resulting in the grammatical but unlikely sentence "The girl was kicked by the ball"? Pretty much zilch, since English contains hundreds of thousands of words: There is pretty much no chance that those particular words would be inserted in those particular locations?

The authors present some data to back up these and some other predictions. For instance, if listeners are given reason to suspect that the speaker makes lots of speech errors, they are then even more likely to "correct" an unlikely sentence to a similar sentence with a more likely meaning.

There's plenty more work to be done. There are plenty of speech errors out there besides insertions and deletions, such as substitutions and the various phonological errors that made Rev. Spooner famous (see quote above). Work on phonological errors shows that speaker are more likely to make errors that result in real words (train->drain) than non-words (train->frain). Likely, the same is true of other types of errors. Building a full theory that incorporates all the complexity of speech processes is a ways off yet. But the work just published is an important proof of concept.

---------
Gibson, E., Bergen, L., and Piantadosi, S. (2013). Rational integration of noisy evidence and prior semantic expectations in sentence interpretation Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1216438110

Do You Speak Korean?


Learning new languages is hard for many reasons. One of those reasons is that the meaning of an individual word can have a lot of nuances, and the degree to which those nuances match up with the nuances of similar words in your first language can make learning the new language easier; the degree to which the nuances diverge can make learning the
new language harder.

In a new experiment, we are looking at English-speakers learning Korean and Korean-speakers learning English. In particular, we are studying a specific set of words that previous research has suggested give foreign language learners a great deal of difficulty.

We are hoping that we will be able to track how knowledge of these words develops as you move from being a novice to a fluent speaker. For this, we will need to find a lots of people who are learning Korean, as well as Korean-speakers who are learning English. If you are one, please participate.

The experiment is called "Trials of the Heart". You can find it here.

We do also need monolingual English speakers (people whose first and essentially only language is English) for comparison, so if you that's you, you are welcome to participate, too!

Image credit

Evolutionary Psychology, Proximate Causation, & Ultimate Causation


Evolutionary psychology has always been somewhat controversial in the media for reasons that generally confuse me (Wikipedia has a nice rundown of the usual complaints). For instance, the good folks at Slate are particularly hostile (here, here and here), which is odd because they are also generally hostile towards Creationism (here, here and here). 

Given the overwhelming evidence that nearly every aspect of the human mind and behavior is at least partly heritable (and so at least partially determined by our genes), the only way to deny the claim that our minds are at least partially a product of evolution is to deny that evolution affects our genes – that is, deny the basic tenants of evolutionary theory. (I suppose you could try to deny the evidence of genetic influence on mind and behavior, but that would require turning a blind eye to such a wealth of data as to make Global Warming Denialism seem like a warm-up activity).

What's the matter with Evolutionary Psychology?

What is there to object to, anyway? Some of the problem seems definitional. Super-Science-Blogger Greg Laden acknowledges that applying evolutionary theory to the study of the human mind is a good idea, but that "evolutionary psychology" refers only to a very specific theory from Cosmides and Tooby, one with which he takes issue. And in general, a lot of the "critiques" I see in the media seem to involve equating the entire field with some specific hypothesis or set of hypotheses, particularly the more exotic ones. 

For instance, some years back Slate ran an article about "Evolutionary Psychology's Anti-Semite", a discussion of Kevin MacDonald, who has an idiosyncratic notion of Judaism as a "group evolution strategy" to maximize, through eugenics, intelligence (the article goes into some detail). It's a pretty nutty idea, gets basic historical facts wrong, and more importantly gets the science wrong. The article tries pretty hard to paint him as a mainstream Evolutionary Psychologist nonetheless. Interviewees aren't that helpful (they mostly dismiss the work as contradicting basic fundamentals of evolutionary theory), but the article author pulls up other evidence, like the fact that MacDonald acknowledged some mainstream researchers in one of his books. (For the record, I acknowledge Benicio del Toro as an inspiration, so you know he fully agrees with everything in this blog post. Oh, and Jenna-Louise Coleman, too.)

This spring, New York Times columnist John Tierney asserted that men must be innately more competitive than women since they monopolize the trophies in -- hold onto your vowels -- world Scrabble competitions. To bolster his case, Tierney turned to evolutionary psychology. In the distant past, he argued, a no-holds-barred desire to win would have been an adaptive advantage for many men, allowing them to get more girls, have more kids, and pass on their competitive genes to today's word-memorizing, vowel-hoarding Scrabble champs.
I will agree that this argument involves a bit of a stretch and is awfully hard to falsify (as the article goes on to point out). And sure, some claims made even by serious evolutionary psychologists are hard to falsify with current technology ... but then so is String Theory. And we do have many methods for testing evolutionary theory in general, and roughly the same ones work whether you are studying the mind and behavior or purely physical attributes of organisms. So, again, if you want to deny that claims about evolutionary psychology are testable, then you end up having to make roughly the same claim about evolutionary theory in general. 

Just common sense

It turns out that when you look at the biology, a good waist-hips ratio for a healthy woman is (roughly) .7, whereas the ideal for men is closer to .9. Now imagine we have a species of early hominids (Group A) that is genetically predispositioned such as that heterosexual men prefer women with a waist-hips ratio of .7 and heterosexual women prefer men with a waist-hips ratio of .9. Now let's say we have another species of early hominids (Group B) where the preferences are reversed, preferring men with ratios of .7 and women with ratios of .9. Since individuals of Group A prefer to mate with healthier partners than Group B does, which one do you think is going to have more surviving children? 

Now compare to Group C, where there is no innate component to interest in waist-hips ratios; beauty has to be learned. Group C is still at a disadvantage to Group A, since some of the people in it will learn to prefer the wrong proportions and preferentially mate with less healthy individuals. In short, all else equal, you would expect evolution to lead to hominids that prefer to mate with hominids that have close-to-ideal proportions.

(If you don't like waist-hips ratios, consider that humans prefer individuals without deformities and gaping sores and boils, and then play the same game.)

Here is another example. Suppose that in Group A, individuals find babies cute, which leads them to want to protect and nourish the infants. In Group B, individuals find babies repulsive, and many actually have an irrational fear of babies (that is, treating babies something like how we treat spiders, snakes & slugs). Which one do you think has more children that survive to adulthood? Once again, it's better to have a love of cuteness hardwired in rather than something you have to learn from society, since all it takes is for a society to get a few crazy ideas about what cute looks like ("they look better decapitated!") and then the whole civilization is wiped out. 

(If you think that babies just *are* objectively cute and that there's no psychology involved, consider this: Which do you find cuter, a human baby or a skunk baby? Which do you think a mother skunk finds cuter?)

These are the kinds of issues that mainstream evolutionary psychology trucks in. And the theory does produce new predictions. For instance, you'd expect that in species where a .7 waist-hips ratio is not ideal for females (that is, pretty much any species other than our own), it wouldn't be favored (and it isn't). And the field is generally fairly sensible, which is not to say that all the predictions are right or that evolutionary theory doesn't grow and improve over time (I understand from a recent conversation that there is now some argument about whether an instinct for third-party punishment is required for sustainable altruism, which is something I had thought was a settled matter). 

Findings: The Role of World Knowledge in Pronoun Interpretation

A few months ago, I posted the results of That Kind of Person. This was the final experiment in a paper on pronoun interpretation, a paper which is now in press. You can find a PDF of the accepted version here.

How it Began

Isaac Asimov famously observed that "the most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" That quote describes this project fairly well. The project grew out of a norming study. Norming studies aren't really even real experiments -- they are mini experiments used to choose stimuli.

I was designing an ERP ("brain wave") study of pronoun processing. A group in Europe had published a paper using ERPs to look at a well-known phenomenon in pronoun interpretation, one which has been discussed a lot on this blog, in which pronoun interpretation clearly depends on context:

(1) Sally frightens Mary because she...
(2) Sally likes Mary because she...

Most people think that "she" refers to Sally in (1) but Mary in (2). This seems to be a function of the verbs in (1-2), since that's all that's different between the sentences, and in fact other verbs also affect pronoun interpretation. We wanted to follow up some of the previous ERP work, and we were just choosing sentences. You get nice big ERP effects (that is, big changes in the brain waves) when something is surprising, so people often compare sentences with unexpected words to those with expected words, which is what this previous group had done:

(3) Sally frightens Bill because she...
(4) Bill frightens Sally because she...

You should get the sense that the pronoun "she" is a bit more surprising in (4) than in (3). Comparing these sentences to (1-2) should make it clear why this is.

The Twist

A number of authors argued that what is going on is that these sentences (1-4) introduce an explanation ("because..."). As you are reading or listening to the sentence, you think through typical causes of the event in question (frightening, liking, etc.) and so come up with a guess as to who is going to be mentioned in the explanation. More good explanations of an instance of frightening involve the frightener than the frightenee, and more good explanations of an instance of liking involve the like-ee than the liker.

The authors supported the argument by pointing to studies showing that what you know about the participants in the event matters. In general, you might think that in any given event involving a king and a butler, kings are more likely to be responsible for the event simply because kings have more power. So in the following sentence, you might interpret the pronoun as referring to the king even though it goes against the "typical" pattern for frighten (preferring explanations involve the frightener).

(5) The butler frightened the king because...

What got people particularly excited about this is that it all has to happen very fast. Studies have shown that you can interpret the pronoun in such sentences in a fraction of a second. If you can do this based on a complex inference about who is likely to do what, that's very impressive and puts strong constraints on our theory of language.

The Problem

I was in the process of designing an ERP experiment to follow up a previous one in Dutch that I wanted to replicate in English. I had created a number of sentences, and we were running a simple experiment in which people rate how "natural" the sentences sound. We were doing this just to make sure none of our sentences were weird, since that -- as already mentioned -- can have been effects on the brain waves, which could swamp any effects of the pronoun. Again, we expected people to rate (4) as less natural than (3); what we wanted to make sure was that people didn't rate both (3) and (4) as pretty odd. We tested a couple hundred such sentences, from which we would pick the best for the study.

I was worried, though, because a number of previous studies had suggested that gender itself might matter. This follows from the claim that who the event participants are matters (e.g., kings vs. butlers). Specifically, a few studies had reported that in a story about a man and a woman, people expect the man to be talked about more than the woman, analogous to expecting references to the king rather than the butler in (5). Was this a confound?

I ran the study anyway, because we would be able to see in the data just how bad the problem was. To my surprise, there was no effect of gender at all. I started looking at the literature more carefully and noticed that several people had similarly failed to find such effects. One paper had found an effect, but it seemed to be present in only a small handful of sentences out of the large number they had tested. I looked into studies that had investigated sentences like (5) and discovered ... that they didn't exist! Rather, the studies researchers had been citing weren't about pronoun interpretation at all but something else. To be fair, some researchers had suggested that there might be a relationship between this other phenomenon and pronoun interpretation, but it had never been shown. I followed up with some experiments seeing whether the king/butler manipulation would affect pronoun interpretation, and it didn't. (For good measure, I also showed that there is little if any relationship between that other phenomenon and pronouns.)

A Different Problem

So it looked like the data upon which much recent work on pronouns is built was either un-replicable or apocryphal. However, the associated theory had become so entrenched, that this was a difficult dataset to publish. I ultimately had to run around a dozen separate experiments in order to convince reviewers that these effects really don't exist (or mostly don't exist -- there do seem to be a tiny percentage of sentences, around 5%, where you can get reliable if very small effects of gender). (A typical paper has 1-4 experiments, so a dozen is a lot. Just in order to keep the paper from growing to an unmanageable length, I combined various experiments together and reported each one as a separate condition of a larger experiment.)

Most of these experiments were run on Amazon Mechanical Turk, but the final one was run at GamesWithWords.org and was announced on this blog (read the results of that specific experiment here). The paper is now in press at Language & Cognitive Processes. You can read the final submitted version here.

Conclusion

So what does all this mean? In many ways, it's a correction to the literature. A lot of theoretical work was built around findings that turned out to be wrong or nonexistent. In particular, the idea that pronoun interpretation involves a lot of very rapid inferences based on your general knowledge about the world. That's not quite the same thing as having a new theory, but we've been exploring some possibilities that no doubt will be talked about more here in the future.
----

Joshua K. Hartshorne (2014). What is implicit causality? Language and Cognitive Processes