Field of Science

Findings: The Role of World Knowledge in Pronoun Interpretation

A few months ago, I posted the results of That Kind of Person. This was the final experiment in a paper on pronoun interpretation, a paper which is now in press. You can find a PDF of the accepted version here.

How it Began

Isaac Asimov famously observed that "the most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" That quote describes this project fairly well. The project grew out of a norming study. Norming studies aren't really even real experiments -- they are mini experiments used to choose stimuli.

I was designing an ERP ("brain wave") study of pronoun processing. A group in Europe had published a paper using ERPs to look at a well-known phenomenon in pronoun interpretation, one which has been discussed a lot on this blog, in which pronoun interpretation clearly depends on context:

(1) Sally frightens Mary because she...
(2) Sally likes Mary because she...

Most people think that "she" refers to Sally in (1) but Mary in (2). This seems to be a function of the verbs in (1-2), since that's all that's different between the sentences, and in fact other verbs also affect pronoun interpretation. We wanted to follow up some of the previous ERP work, and we were just choosing sentences. You get nice big ERP effects (that is, big changes in the brain waves) when something is surprising, so people often compare sentences with unexpected words to those with expected words, which is what this previous group had done:

(3) Sally frightens Bill because she...
(4) Bill frightens Sally because she...

You should get the sense that the pronoun "she" is a bit more surprising in (4) than in (3). Comparing these sentences to (1-2) should make it clear why this is.

The Twist

A number of authors argued that what is going on is that these sentences (1-4) introduce an explanation ("because..."). As you are reading or listening to the sentence, you think through typical causes of the event in question (frightening, liking, etc.) and so come up with a guess as to who is going to be mentioned in the explanation. More good explanations of an instance of frightening involve the frightener than the frightenee, and more good explanations of an instance of liking involve the like-ee than the liker.

The authors supported the argument by pointing to studies showing that what you know about the participants in the event matters. In general, you might think that in any given event involving a king and a butler, kings are more likely to be responsible for the event simply because kings have more power. So in the following sentence, you might interpret the pronoun as referring to the king even though it goes against the "typical" pattern for frighten (preferring explanations involve the frightener).

(5) The butler frightened the king because...

What got people particularly excited about this is that it all has to happen very fast. Studies have shown that you can interpret the pronoun in such sentences in a fraction of a second. If you can do this based on a complex inference about who is likely to do what, that's very impressive and puts strong constraints on our theory of language.

The Problem

I was in the process of designing an ERP experiment to follow up a previous one in Dutch that I wanted to replicate in English. I had created a number of sentences, and we were running a simple experiment in which people rate how "natural" the sentences sound. We were doing this just to make sure none of our sentences were weird, since that -- as already mentioned -- can have been effects on the brain waves, which could swamp any effects of the pronoun. Again, we expected people to rate (4) as less natural than (3); what we wanted to make sure was that people didn't rate both (3) and (4) as pretty odd. We tested a couple hundred such sentences, from which we would pick the best for the study.

I was worried, though, because a number of previous studies had suggested that gender itself might matter. This follows from the claim that who the event participants are matters (e.g., kings vs. butlers). Specifically, a few studies had reported that in a story about a man and a woman, people expect the man to be talked about more than the woman, analogous to expecting references to the king rather than the butler in (5). Was this a confound?

I ran the study anyway, because we would be able to see in the data just how bad the problem was. To my surprise, there was no effect of gender at all. I started looking at the literature more carefully and noticed that several people had similarly failed to find such effects. One paper had found an effect, but it seemed to be present in only a small handful of sentences out of the large number they had tested. I looked into studies that had investigated sentences like (5) and discovered ... that they didn't exist! Rather, the studies researchers had been citing weren't about pronoun interpretation at all but something else. To be fair, some researchers had suggested that there might be a relationship between this other phenomenon and pronoun interpretation, but it had never been shown. I followed up with some experiments seeing whether the king/butler manipulation would affect pronoun interpretation, and it didn't. (For good measure, I also showed that there is little if any relationship between that other phenomenon and pronouns.)

A Different Problem

So it looked like the data upon which much recent work on pronouns is built was either un-replicable or apocryphal. However, the associated theory had become so entrenched, that this was a difficult dataset to publish. I ultimately had to run around a dozen separate experiments in order to convince reviewers that these effects really don't exist (or mostly don't exist -- there do seem to be a tiny percentage of sentences, around 5%, where you can get reliable if very small effects of gender). (A typical paper has 1-4 experiments, so a dozen is a lot. Just in order to keep the paper from growing to an unmanageable length, I combined various experiments together and reported each one as a separate condition of a larger experiment.)

Most of these experiments were run on Amazon Mechanical Turk, but the final one was run at and was announced on this blog (read the results of that specific experiment here). The paper is now in press at Language & Cognitive Processes. You can read the final submitted version here.


So what does all this mean? In many ways, it's a correction to the literature. A lot of theoretical work was built around findings that turned out to be wrong or nonexistent. In particular, the idea that pronoun interpretation involves a lot of very rapid inferences based on your general knowledge about the world. That's not quite the same thing as having a new theory, but we've been exploring some possibilities that no doubt will be talked about more here in the future.

Joshua K. Hartshorne (2014). What is implicit causality? Language and Cognitive Processes


Marte said...

So, have you done this work with ERPs as well? From the blog post I derive that that was the original goal, but it is not reported in the paper, right? I would still be interested to see the data though.

GamesWithWords said...

We did do a ERP study. The results were confusing, though. Basically, there were no results. There was no difference in the ERP to the pronoun in

Bill frightened Mary because he...


Mary frightened Bill because he...

This contrasts with what was been reported in several studies in Dutch, and also contrasts with multiple self-paced readings studies, in which people have more trouble reading the pronoun when it mismatches the expected continuation (the second example above).

So we suspect there's something wrong with the data: too little power, a problem with the equipment, a bug in the analysis, etc. But we haven't been able to find it yet.

Marte said...

But the lack of differences in the ERP could also be due to other factors. Maybe there is some critical difference in the stimuli? And did your subjects perform an additional task when reading the sentences, like in the behavioral studies that are reported above? I'm sure that the Van Berkum study did not employ any other task than reading for comprehension. Task/ no task could make a difference as well.

GamesWithWords said...

We used reading for comprehension, with occasional comprehension questions. All the items we used were strongly biased (we tested in a separate experiment). The primary difference between what we did and what the van Berkum group did is that we did not include a couple sentences before the critical sentence. If that's the relevant difference, there are two interpretations:
1) Their result was actually *driven* by the leading sentences.
2) ERP effects are small if you aren't well into a long discourse.

I've seen a lot of (non-ERP) studies where the effects were accidentally driven by the sentences prior to the critical sentence, which is why I don't like using the design. On the other hand, I know that some ERP believe that many ERP effects require a longer discourse (not sure how systematic the data are). If we were to run a follow-up, that's probably the direction we would go.