Games with Words: findings

Showing posts with label findings. Show all posts

Updated results on the relationship between English dialects

Posted by GamesWithWords on Thursday, June 05, 2014

I've updated the interactive visualization of the relationships between the Englishes of the world to include a couple dozen additional native languages. Check it out.

Findings: Which English -- updated dialect chart

Posted by GamesWithWords on Friday, May 23, 2014

I have updated the dialect chart based on the results for the first few days. Since the new version shows up automatically in the frame in the previous post, I haven't added it in here. And you can get a better look at it on the website.

The biggest difference is that also added several "dialects" for non-native speakers of English. That is, I added five new dialects, one each for people whose first language was Spanish, German, Portuguese, Dutch, or Finnish. I'll be adding more of these dialects in the future, but those just happen to be the groups for which we have a decent number of respondents.

As you can see, the algorithm finds that American & Canadian speakers are more likely one another than they are like anyone else. Similarly, English, Irish, Scottish, and Australian speakers are more likely one another than anyone else. And the non-native English speakers also form a group. I'll leave you to explore the more fine-grained groupings on your own.

If you are wondering why New Zealanders are off by themselves, that's mostly because we don't have very many of them, and the algorithm has difficulty classifying dialects for which there isn't much data. Same for Welsh English, South African English, and Black Vernacular English. So if you know people who speak any of those dialects...

The English Grammars of the World

Posted by GamesWithWords on Tuesday, May 20, 2014

It's widely observed that not everybody speaks English the same way. Depending on where you grew up, you might say y'all, you guys, or just you. You might pronounce grocery as if it were "groshery" or "grossery." There have been some excellent, fine-grained studies of how these aspects of English vary across the United States and elsewhere, such as this one.

But vocabulary and pronunciation aren't the only things that vary across different dialects of English. We are in the midst of a soft launch of a new project which will, among things, help map out the differences in English grammar around the world.

I put together a visualization of early results below (you may want to load it in its own page -- depending on your browser, the embedded version below may not work). You can use this graphic to explore the similarities among nine English dialects (American, Canadian, English English, Irish, New Zealandish, Northern Irish, Scottish, and South African).

As more results come in (about other dialects like Ebonics and Welsh, about specific parts of America or Canada, etc.), I'll be updating this graphic. So please take the survey and then check back in soon.

Load the graphic directly here.

Results (Round 1): Crowdsourcing the Structure of Meaning & Thought

Posted by GamesWithWords on Tuesday, December 17, 2013

Language is a device for moving a thought from one person's head into another's. This means to have any real understanding of language, we also need to understand thought. This is what makes work on language exciting. It is also what makes it hard.

With the help of over 1,500 Citizen Scientists working through our VerbCorner project, we have been making rapid progress.

Grammar, Meaning, & Thought

You can say Albert hit the vase and Albert hit at the vase. You can say Albert broke the vase but you can't say Albert broke at the vase. You can say Albert sent a book to the boarder [a person staying at a guest house] or Albert sent a book to the border [the line between two countries], but while you can say Albert sent the boarder a book, you can't say Albert sent the border a book. And while you say Albert frightened Beatrice -- where Beatrice, the person experiencing the emotion, is the object of the verb -- you must say Beatrice feared Albert -- where Beatrice, the person experiencing the emotion, is now the subject.

How do you know which verb gets used which way? One possibility is that it is random, and this is just one of those things you must learn about your language, just like you have to learn that the animal in the picture on the left is called a "dog" and not a "perro", "xiaogou," or "sobaka." This might explain why it's hard to learn language -- so hard that non-human animals and machines can't do it. In fact, it results in a learning problem so difficult that many researchers believe it would be impossible, even for humans (see especially work on Baker's Paradox).

Many researchers have suspected that there are patterns in terms of which verbs can get used in which ways, explaining the structure of language and how language learning is possible, as well as shedding light on the structure of thought itself. For instance, the difference (it is argued) between Albert hit the vase and Albert hit at the vase is that the latter sentence means that Albert hit the vase ineffectively. You can't say Albert broke at the vase because you can't ineffectively break something: It is either broken or not. The reason you can't say Albert sent the border a book is that this construction means that the border owns the book, which a border can't do -- borders aren't people and can't own anything -- but a boarder can. The difference between Albert frightened Beatrice and Beatrice feared Albert is that the former describes an event that happened in a particular time and place (compare Albert frightened Beatrice yesterday in the kitchen with Beatrice feared Albert yesterday in the kitchen).

When researchers look at the aspects of meaning that matter for grammar across different languages, many of the same aspects pop up over and over again. Does the verb describe something changing (break vs. hit)? Does it describe something only people can do (own, know, believe vs. exist, break, roll)? Does it describe an event or a state (frighten vs. fear)? This is too suspicious of a pattern to be accidental. Researchers like Steven Pinker have argued that language cares about these aspects of meaning because these are basic distinctions our brain makes when we think and reason about the world (see Stuff of Thought). Thus, the structure of language gives us insight into the structure of thought.

The Question

The theory is very compelling and is exciting if true, but there are good reasons to be skeptical. The biggest one is that there simply isn't that much evidence one way or another. Although a few grammatical constructions have been studied in detail (in recent years, this work has been spearheaded by Ben Ambridge of the University of Liverpool), the vast majority have not been systematically studied, even in English. Although evidence so far suggests that which verbs go in which grammatical constructions is driven primarily or entirely by meaning, skeptics have argued that is because researchers so far have focused on exactly those parts of language that are systematic, and that if we looked at the whole picture, we would see that things are not so neat and tidy.

The problem is that no single researcher -- nor even an entire laboratory -- can possibly investigate the whole picture. Checking every verb in every grammatical construction (e.g., noun verb noun vs. noun verb at noun, etc.) for every aspect of meaning would take one person the rest of her life.

CrowdSourcing the Answer

Last May, VerbCorner was launched to solve this problem. For the first round of the project, we posted questions about 641 verbs and six different aspects of meaning. By October 18th, 1,513 volunteers had provided 117,584 judgments, which works out to 3-4 people per sentence per aspect of meaning. That was enough data to start analyzing.

As predicted, there is a great deal of systematicity in the relationship between meaning and grammar (for details on the analysis, see the next section). These results suggest that the relationship between grammar and meaning may indeed be very systematic, helping to explain how language is learnable at all. It also gives us some confidence in the broad project of using language as a window into how the brain thinks and reasons about the world. This is important, because the mind is not easy to study, and if we can leverage what we know about language, we will have learned a great deal. As we test more verbs and more aspects of meaning -- I recently added an additional aspect of meaning and several hundred new verbs -- that window will be come clearer and clearer.

Unless, of course, it turns out that not all of language is so systematic. While our data so far represent a significant proportion of all research to date, it's only a tiny fraction of English. That is what makes research on language so hard: there is so much of it, and it is incredibly complex. But with the support of our volunteer Citizen Scientists, I am confident that we will be able to finish the project and launch a new phase of the study of language.

That brings up one additional aspect of the results: It shows that this project is possible. Citizen Science is rare in the study of the mind, and many of my colleagues doubted that amateurs could provide reliable results. In fact, by the standard measures of reliability, the information our volunteers contributed is very reliable.

Of course, checking for a systematic relationship between grammar and meaning is only the first step. We'd also liked to understanding which verbs and grammatical constructions have which aspects of meaning and why, and leverage this knowledge into understanding more about the nature of thought. Right now, we still don't have enough data to have exciting new conclusions (for exciting old conclusions, see Pinker's Stuff of Thought). I expect I'll have more to say about that after we complete the next phase of data collection.

Details of the Analysis

Here is how we did the analyses. If meaning determines which grammatical constructions a given verb can appear in, then you would expect that all the verbs that appear in the same set of frames should be the same in terms of the core aspects of meaning discussed above. So if one of those verbs describes, for instance, physical contact, then all of them should.

Helpfully, the VerbNet project -- which was built on earlier work by Beth Levin -- has already classified over 6,000 English verbs according to which grammatical constructions they can appear in. The 641 verbs posted in the first round of the VerbCorner project consisted of all the verbs from 11 of these classes.

So is it the case that in a given class, all the verbs describe physical contact or all of them do not? One additional complication is that, as I described above, the grammatical construction itself can change the meaning. So what I did was count what percentage of verbs from the same class have the same value for a given aspect of meaning for each grammatical construction, and then I averaged over those constructions.

The "Explode on Contact" task in VerbCorner asked people to determine whether a given sentence (e.g., Albert hugged Beatrice) described contact between different people or things. Were the results for a given verb class and a given grammatical construction? Several volunteers checked each sentence. If there was disagreement among the volunteers, I used whatever answer the majority had chosen.

This graph shows the degree of consistency by verb class (the classes are numbered according to their VerbNet number), with 100% being maximum consistency. You can see that all eleven classes are very close to 100%. Obviously, exactly 100% would be more impressive, but that's extremely rare to see when working with human judgments, simply because people make mistakes. We addressed this in part by having several people check each sentence, but there are so many sentences (around 5,000), that simply by bad luck sometimes several people will all make a mistake on the same sentence. So this graph looks as close to 100% as one could reasonably expect. As we get more data, it should get clearer.

Results were similar for other tasks. Another one looked at whether the sentence described someone applying force (pushing, shoving, etc.) to something or someone else:

Maybe everything just looks very consistent? We actually had a check for that. One of the tasks measures whether the sentence describes something that is good, bad, or neither. These is no evidence that this aspect of meaning matters for grammar (again, the hypothesis is not that every aspect of meaning matters -- only certain ones that are particularly important for structuring thought are expected to matter). And, indeed, we see much less consistency:

Notice that there is still some consistency, however. This seems to be mostly because most sentences describe something that is neither good nor bad, so there is a fair amount of essentially accidental consistency within each verb class. Nonetheless, this is far less consistency that what we saw for the other five aspects of meaning studied.

Findings: GamesWithWords.org at DETEC2013

Posted by GamesWithWords on Wednesday, July 03, 2013

I recently returned from the inaugural Discourse Expectations: Theoretical, Experimental, and Computational Perspectives workshop, where I presented a talk ("Three myths about implicit causality") which ties together a lot of the pronoun research that I have been doing over the last few years, including results from several GamesWithWords.org experiments (PronounSleuth, That Kind of Person, and Find the Dax).

Findings: The Role of World Knowledge in Pronoun Interpretation

Posted by GamesWithWords on Wednesday, May 01, 2013

A few months ago, I posted the results of That Kind of Person. This was the final experiment in a paper on pronoun interpretation, a paper which is now in press. You can find a PDF of the accepted version here.

How it Began

Isaac Asimov famously observed that "the most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" That quote describes this project fairly well. The project grew out of a norming study. Norming studies aren't really even real experiments -- they are mini experiments used to choose stimuli.

I was designing an ERP ("brain wave") study of pronoun processing. A group in Europe had published a paper using ERPs to look at a well-known phenomenon in pronoun interpretation, one which has been discussed a lot on this blog, in which pronoun interpretation clearly depends on context:

(1) Sally frightens Mary because she...

(2) Sally likes Mary because she...

Most people think that "she" refers to Sally in (1) but Mary in (2). This seems to be a function of the verbs in (1-2), since that's all that's different between the sentences, and in fact other verbs also affect pronoun interpretation. We wanted to follow up some of the previous ERP work, and we were just choosing sentences. You get nice big ERP effects (that is, big changes in the brain waves) when something is surprising, so people often compare sentences with unexpected words to those with expected words, which is what this previous group had done:

(3) Sally frightens Bill because she...

(4) Bill frightens Sally because she...

You should get the sense that the pronoun "she" is a bit more surprising in (4) than in (3). Comparing these sentences to (1-2) should make it clear why this is.

The Twist

A number of authors argued that what is going on is that these sentences (1-4) introduce an explanation ("because..."). As you are reading or listening to the sentence, you think through typical causes of the event in question (frightening, liking, etc.) and so come up with a guess as to who is going to be mentioned in the explanation. More good explanations of an instance of frightening involve the frightener than the frightenee, and more good explanations of an instance of liking involve the like-ee than the liker.

The authors supported the argument by pointing to studies showing that what you know about the participants in the event matters. In general, you might think that in any given event involving a king and a butler, kings are more likely to be responsible for the event simply because kings have more power. So in the following sentence, you might interpret the pronoun as referring to the king even though it goes against the "typical" pattern for frighten (preferring explanations involve the frightener).

(5) The butler frightened the king because...

What got people particularly excited about this is that it all has to happen very fast. Studies have shown that you can interpret the pronoun in such sentences in a fraction of a second. If you can do this based on a complex inference about who is likely to do what, that's very impressive and puts strong constraints on our theory of language.

The Problem

I was in the process of designing an ERP experiment to follow up a previous one in Dutch that I wanted to replicate in English. I had created a number of sentences, and we were running a simple experiment in which people rate how "natural" the sentences sound. We were doing this just to make sure none of our sentences were weird, since that -- as already mentioned -- can have been effects on the brain waves, which could swamp any effects of the pronoun. Again, we expected people to rate (4) as less natural than (3); what we wanted to make sure was that people didn't rate both (3) and (4) as pretty odd. We tested a couple hundred such sentences, from which we would pick the best for the study.

I was worried, though, because a number of previous studies had suggested that gender itself might matter. This follows from the claim that who the event participants are matters (e.g., kings vs. butlers). Specifically, a few studies had reported that in a story about a man and a woman, people expect the man to be talked about more than the woman, analogous to expecting references to the king rather than the butler in (5). Was this a confound?

I ran the study anyway, because we would be able to see in the data just how bad the problem was. To my surprise, there was no effect of gender at all. I started looking at the literature more carefully and noticed that several people had similarly failed to find such effects. One paper had found an effect, but it seemed to be present in only a small handful of sentences out of the large number they had tested. I looked into studies that had investigated sentences like (5) and discovered ... that they didn't exist! Rather, the studies researchers had been citing weren't about pronoun interpretation at all but something else. To be fair, some researchers had suggested that there might be a relationship between this other phenomenon and pronoun interpretation, but it had never been shown. I followed up with some experiments seeing whether the king/butler manipulation would affect pronoun interpretation, and it didn't. (For good measure, I also showed that there is little if any relationship between that other phenomenon and pronouns.)

A Different Problem

So it looked like the data upon which much recent work on pronouns is built was either un-replicable or apocryphal. However, the associated theory had become so entrenched, that this was a difficult dataset to publish. I ultimately had to run around a dozen separate experiments in order to convince reviewers that these effects really don't exist (or mostly don't exist -- there do seem to be a tiny percentage of sentences, around 5%, where you can get reliable if very small effects of gender). (A typical paper has 1-4 experiments, so a dozen is a lot. Just in order to keep the paper from growing to an unmanageable length, I combined various experiments together and reported each one as a separate condition of a larger experiment.)

Most of these experiments were run on Amazon Mechanical Turk, but the final one was run at GamesWithWords.org and was announced on this blog (read the results of that specific experiment here). The paper is now in press at Language & Cognitive Processes. You can read the final submitted version here.

Conclusion

So what does all this mean? In many ways, it's a correction to the literature. A lot of theoretical work was built around findings that turned out to be wrong or nonexistent. In particular, the idea that pronoun interpretation involves a lot of very rapid inferences based on your general knowledge about the world. That's not quite the same thing as having a new theory, but we've been exploring some possibilities that no doubt will be talked about more here in the future.
----

Joshua K. Hartshorne (2014). What is implicit causality? Language and Cognitive Processes

Everlasting Love

Posted by GamesWithWords on Monday, April 29, 2013

I just got back data from a survey in which we asked people to estimate how long different emotions are likely to last. We'll use this information to design a future experiment looking at how people expect emotions to be encoded in language. In the meantime, what struck me is that of all the emotions we asked about, the one that people expected to last the longest was "being head-over-heels in love". Which is awesome.

(Image courtesy of Faizal Sharif)

Findings: That Kind of Person

Posted by GamesWithWords on Wednesday, December 12, 2012

That Kind of Person is now complete. Many thanks to all who answered the call to participate.

For some time now, I have been studying the effect of context on pronoun interpretation. If words and sentences always meant what they meant regardless of context, linguistics and psycholinguistics would be much easier, and we would have much better computer translation, speech recognition, etc. Unfortunately, the same word (bank) can often mean different things in different contexts (he paddled over to the bank versus he cashed a check at the back).

Pronouns are a great guinea pig for studying the role of context, because they derive almost all their meaning from context (try to define “she” or “he” and compare it to your definition of “Martha Washington” or “George Washington”).

Great Expectations

Recently, a picture has started to emerge, at least in the case pronouns. The basic idea, due mostly to the work of Andrew Kehler at UCSD*, is that our initial interpretation of a pronoun is driven by what we think is likely to be talked about next. If this seems obvious, the dominant theory at the time Kehler started working (Centering Theory and variants) argued that our initial interpretation of the pronoun is that it refers to whatever person or thing is currently most “salient” (what counts as "salient" depends on the version of the theory) -- a hypothesis that also usually strikes folks as obvious.

Kehler's big contribution was articulating theory of discourse structure – that is, how sentences relate to one another – that can be used to fairly accurately predict what people expect to be mentioned next. (If you are interested in these issues and have a little background in linguistics, Kehler's book, Coherence, Reference, and the Theory of Grammar is fantastic.) For instance, sometimes one sentence introduces the consequence of another sentence:

(1) John frightened Bill, so he ran away.

Here, the second sentence (or, if you prefer, second clause) describes a consequence of the first sentence. Most likely "he" refers to Bill, because Bill running away would be a reasonable consequence of John frightening him. In contrast, other sentences explain the previous sentence:

(2) John frightened Bill because he is scary.

Here, "he" probably refers to John, since John being scary would be a good explanation of his frightening of Bill.

There are many other types of relationships between sentences, and they have predictable effects on pronoun interpretation. Although Kehler's theory explains a lot, it does not explain, for example, why we think Bill running away is a more likely effect of John frightening Bill than Bill running away.

The role of verbs

In two recent papers, which I discussed on this blog, my colleagues and I argued that verbs play a major role. Verbs -- specifically, the relationship between a verb and its subject and object -- provide a lot of information about events. We drew in particular on one line of theoretical work (usually called "predicate decomposition theory"), which tries to explain how verb meaning can be built out of a few constituent parts. The details aren't important here. What is important is that this theory argues that some verbs specify who the cause of the event was. What we showed was that usually, in sentences like (2), people think the pronoun refers to the person that the verb specifies as the cause. In this case, "frighten" means something like "John caused Bill to be afraid". Remember that "he is scary" is an explanation of "John frightened Bill." Explanations usually refer to causes.

In short, by drawing on independent theories of discourse structure and verb meaning, we were able to predict very well how people will interpret pronouns in various contexts. At least, we could do so in the ones we tried -- there's a lot of work left to be done to fully flesh out this work.

The problem

I have been presenting this work for a while, and I often get the following objection: We already know that verbs can't be doing all (or even much) of the work. The real story, it was argued, is much more complex. Thinking just about the explanation sentences like (2), Pickering and Majid (2007) noted that multiple factors "affect the construction of the event representation, and it is this event representation that is used to infer the cause..." They cite experimental findings argued to show that pronoun interpretation in sentences like (2) depend in complex ways not just on the verb but on what you know about the subject and the object:

In addition, properties of the participants affect implicit causality. Changing the gender (Lafrance, Brownell, & Hahn, 1997), animacy (Corrigan, 1988, 1992), or typicality (Corrigan, 1992; Garvey et al., 1976) of the participants changes the [pronoun interpretation].

After hearing this enough times, I started what I thought would be a series of studies to look at how information about the subject and object interact with the verb in real time during sentence comprehension. This project never got off the ground because I couldn't find any such effects. That is, I have now run a number of studies where I manipulate the gender or typicality, etc., of the subject and object, and they have no effect on pronoun interpretation.

It turns out that there was some confusion in the literature. The studies that Pickering and Majid cite in the quote above mostly don't look at pronoun interpretation at all. Most look at a different task:

(3) John frightened Bill.
a. How likely is this because John is the kind of person who frightens people? 1 2 3 4 5 6 7 8 9
b. How likely is this because Bill is the kind of person people frighten? 1 2 3 4 5 6 7 8 9

Researchers look whether the answer to (a) is greater or less than the answer to (b) to decide who people think caused the event: John or Bill? Much of the literature has assumed that the answer to this question should predict what happens in pronoun sentences like (2), even though this has never been rigorously shown. (Why it hasn't been carefully tested is a bit of a mystery. It is so widely believed to be true that I suspect many folks don't realize that it hasn't been tested. It actually took me several years to pick up on this fact myself.)

I now have a long line of studies showing that there is little relationship between the two tasks. Also, although manipulating who the subject and object are affect the task in (3), I find very little evidence that it affects pronoun interpretation in (2). For instance, compare the following:

(4) a. The king frightened the page because he....
b. The page frightened the king because he....

Everybody agrees that, in general, it is more likely that kings frighten pages than that pages frighten kings, and so if you use these sentences in (3), you get a nice effect of who the subject is. But it doesn't affect pronoun interpretation at all.

This is a serious blow to Pickering and Majid's argument. They argued that pronoun interpretation cannot be all (or mostly) about discourse structure and verb meaning because these interact in complex ways with knowledge about the subject and object (I should add: non-linguistic knowledge. It presumably is not part of the definition of king and page that kings frighten pages but not vice versa, but rather something you learn about the world). If it turns out that this is not the case, then discourse structure + verb meaning may well explain much or all of the phenomenon at hand.

That Kind of Person

That was my argument, anyway, in a paper that I have been shopping around for a couple years now. The difficulty with publishing this paper is that it makes a null argument: you can't find effects of knowledge about the subject and object on pronoun interpretation. In fact, all I can show is that the manipulations I have tried haven't worked, not that no manipulation works (you can't try everything!). So much of the review process has been reviewers suggesting additional experiments and me running them. The latest -- and I hope last -- one was That Kind of Person.

A reviewer very smartly noted that a big difference between (2) and (3) is that (3) asks about the kind of person the subject is and the kind of person the object is, whereas (2) does not. What we are manipulating in our king/page manipulation is, of course, the kind of person the subject is and the kind of person that the object is. So the reviewer suggested the following pronoun task:

(5) a. The king frightened the page because he is the kind of person that...
b. The page frightened the king because he is the kind of person that...

The specific manipulation was one of status. It was argued in the literature that people are more likely to think that high-status folk (kings) caused the event that low-status folk (pages). This does turn out to be true if you use the task in (3), but yet again I found no effect on pronouns, either using sentences like (4) or like (5). (Sorry -- I was going to include a graph, but the results aren't formatted for graphing yet, and it's time for lunch! Maybe when the paper is published...)

Conclusions

I think the result of this work is that it suggests that we really are narrowing in on "the" theory of pronoun interpretation (though there is a lot of work left), a theory in which most of the work is done by discourse structure and verb meaning. This is pretty exciting, because it would be one of the rare cases where we have a reasonably complete theory of how context affects word meaning. It does leave open the question of what the task in (3) is measuring, and why it doesn't match what the pronoun tasks measure. That's still the sticking point in the review. I have a few new ideas, and we'll see what the reviewers say this time around.

----
*Editors at newspapers and magazines usually request that, whenever you introduce a scientist in an article, you state name, institution, and scientific field. The first two are easy, but the last one is hard, particularly when you frequently write about interdisciplinary research (which I do). I wrote about Kehler in an article for Scientific American Mind a while back, and introducing him caused a long debate. His degree is in computer science, he works in a linguistics department, but his work is probably best described as psychology. So what is he?

Just another reason I prefer blogging.

Findings: Linguistic Universals in Pronoun Resolution - Episode II

Posted by GamesWithWords on Monday, November 19, 2012

A new paper, based on data collected through GamesWithWords.org, is now in press (click here for the accepted draft). Below is an overview of the paper.

Many of the experiments at GamesWithWords.org have to do with pronouns. I find pronouns interesting because, unlike many other words, the meaning of a pronoun is almost entirely dependent on context. So while "Jane Austen" refers to Jane Austen no matter who says it or when, "I" refers to a different person, depending mostly on who says it (but not entirely: an actor playing a part uses "I" to refer not to himself but to the character he's playing). Things get even hairier when we start looking at other pronouns like "he" and "she". This means that pronouns are a good laboratory animal for investigating how people use context to help interpret language.

Mice make lousy laboratory animals for studying the role of context in language.

Pronouns are better.

I have spent a lot of time looking at one particular contextual effect, originally discovered by Garvey and Caramazza in the mid-70s:

(1) Sally frightens Mary because she...
(2) Sally loves Mary because she...

Although the pronoun is ambiguous, most people guess that she refers to Sally in (1) but Mary in (2). That is, the verb used (frightens, loves) seems to affect pronoun resolution. Replace "frightens" and "loves" with other verbs, and what happens to the pronoun depends on the verb: some verbs lead to subject resolutions like frightens, some to object resolutions like loves, and some leave people unsure (that is, they think that either interpretation of the pronoun is equally reasonable).

The question is why. One possibility is that this is some idiosyncratic fact about the verb. Just as you learn that the past tense of walk is walked but the past tense of run is ran, you learn that some verbs lead you to resolve pronouns to the verbs' subject and some the verbs' object (and some verbs have no preference). This was what was tentatively suggested in the original Garvey and Caramazza paper.

Does the meaning of the verb matter?

One of the predictions of this account is that there's nothing necessary about the fact that frightens leads to subject resolutions whereas loves leads to object resolutions, just as there is no deep reason that run's past tense is ran. English could have been different.

Many researchers have suspected that the pronoun effects we see are not accidental; the pronoun effects arise from some fundamental aspect of the meanings of frightens and loves. Even Garvey & Caramazza suspected this, but all the hypotheses they considered they were able to rule out. Recently, using data from GamesWithWords.org, we presented some evidence that this is right. Interestingly, while researchers studying pronouns were busy trying to come up with some theory of verb meaning that would explain the pronoun effects, many semanticists were independently busy trying to explain verb meaning for entirely different reasons. Usually, they are interested in explaining things like verb alternations. So, for instance, they might notice that verbs for which the subject experiences an emotion about the object:

(3) Mary likes/loves/hates/fears John.

can take "that" complements:

(4) Mary likes/loves/hates/fears that John climbs mountains.

However, verbs for which the object experiences an emotion caused by the subject do not:

(5) Mary pleases/delights/angers/frightens John.
(6) *Mary pleases/delights/angers/frightens that John climbs mountains.

[The asterisk means that the sentence is ill-formed in English.]

Linguists working on these problems have put together lists of verbs, all of which have similar meanings and which can be used in the same way. (VerbNet is the most comprehensive of these.) Notice that in this particular work, "please" and "frighten" end up in the same group as each other and a different group from "like" and "fear" are in a different one: Even though "frighten" and "fear" are similar in terms of the emotion they describe, they have a very different structure in terms of who -- the subject or the object -- feels the emotion.

We took one such list of verb classes and showed that it explained the pronoun effect quite well: Verbs that were in the same meaning class had the same pronoun effect. This suggests that meaning is what is driving the pronoun effect.

Or does it?

If the pronoun effect is driven by the meaning of a verb, then it shouldn't matter what language that verb is in. If you have two verbs in two languages with the same meaning, they should both show the same pronoun effect.

We aren't the first people to have thought of this. As early as 1983, Brown and Fish compared English and Mandarin. The most comprehensive study so far is probably Goikoetxea, Pascual and Ancha's mammoth study of Spanish verbs. The problem was determining identifying cross-linguistic synonyms. Does the Spanish word asustar mean frighten, scare, or terrify?

Is this orangutan scared, frightened or terrified? Does it matter?

Once we showed that frighten, scare and terrify all have the same pronoun effect in English, the problem disappeared. It no longer mattered what the exact translation of asustar or any other word was: Given that entire classes of verbs in English have the same pronoun effect, all we needed to do was find verbs in other languages that fit into the same class.

We focused on transitive verbs of emotion. These are the two classes already introduced: those where the subject experiences the emotion (like/love/hate/fear) and those where the object does (please/delight/anger/frighten) (note that there are quite a few of both types of verbs). We collected new data in Japanese, Mandarin and Russian (the Japanese and Russian studies were run at GamesWithWords.org and/or its predecessor, CogLangLab.org) and re-analyzed published data from English, Dutch, Italian, Spanish, and Finnish.

Results for English verbs (above). "Experiencer-Subject" verbs are the ones like "fear" and "Experiencer-Object" are the ones like "frighten". You can see that people were consistently more likely to think that the pronoun in sentences like (1-2) referred to the subject of Experiencer-Object verbs than Experiencer-Subject verbs.

The results are the same for Mandarin (above). There aren't as many dots because we didn't test as many of the verbs in Mandarin, but the pattern is striking.

The Dutch results (above). The pattern is again the same. Again, Dutch has more of these verb, but the study we re-analyzed had only tested a few of them.

You can read the paper and see the rest of the graphs here. In the future, we would like to test more different kinds of verbs and more languages, but the results so far are striking, and suggest that the pronoun effect is caused by what verbs mean, not some idiosyncratic grammatical feature of the language. There is still a lot to be worked out, though. For instance, we're now pretty sure that some component of meaning is relevant to the pronoun effect, but which component and why?

------------
Hartshorne, J., and Snedeker, J. (2012). Verb argument structure predicts implicit causality: The advantages of finer-grained semantics Language and Cognitive Processes, 1-35 DOI: 10.1080/01690965.2012.689305

Goikoetxea, E., Pascual, G., and Acha, J. (2008). Normative study of the implicit causality of 100 interpersonal verbs in Spanish Behavior Research Methods, 40 (3), 760-772 DOI: 10.3758/BRM.40.3.760

Garvery, C., and Caramazza, A. (1974). Implicit causality in verbs Linguistic Inquiry, 5 (3), 459-464

Roger Brown and Deborah Fish (1983). Are there universal schemas of psychological causality? Archives de Psychologie, 51, 145-153

Findings: What do verbs have to do with pronouns?

Posted by GamesWithWords on Monday, October 15, 2012

A new paper, based on data collected through GamesWithWords.org, is now in press (click here for a pre-print). Below is an overview of this paper.

Unlike a proper name (Jane Austen), a pronoun (she) can refer to a different person just about every time it is uttered. While we occasionally get bogged down in conversation trying to interpret a pronoun (Wait! Who are you talking about?), for the most part we sail through sentences with pronouns, not even noticing the ambiguity.

We have been running a number of studies on pronoun understanding (for some previous posts, see here and here). One line of work looks at a peculiar contextual effect, originally discovered by Garvey and Caramazza in the mid-70s:

(1) Sally frightens Mary because she...
(2) Sally loves Mary because she...

Although the pronoun is ambiguous, most people guess that she refers to Sally in (1) but Mary in (2). That is, the verb used (frightens, loves) seems to affect pronoun resolution.

Causal Verbs

From the beginning, most if not all researchers agreed that this must have something to do with how verbs encode causality: "Sally frightens Mary" suggests that Sally is the cause, which is why you then think that "because she…" refers to Sally, and vice versa for "Sally loves Mary".

The problem was finding a predictive theory: which verbs encode causality which way? A number of theories have been proposed. The first, from Harvard psychologists Roger Brown and Deborah Fish (1983) was that for emotion verbs (frightens, loves), the cause is the person who *isn't* experiencing the emotion -- Sally in (1) and Mary in (2) -- and the subject for all other verbs. This turned out not to be correct. For instance:

(3) Sally blames Mary because she...

Here, most people think "she" is Mary, even though this is not an emotion verb and so the "cause" was supposed to be -- on Brown and Fish's theory -- the subject (Sally).

A number of other proposals have been made, but the data in the literature doesn't clearly support any one (though Rudolph and Forsterling's 1997 theory has been the most popular). In part, the problem was that we had data on a small number of verbs, and as mathematicians like to tell us, you can draw an infinite number of lines a single point (and create many different theories to describe a small amount of data).

Most previous studies had looked at only a few dozen. With the help of visitors to GamesWithWords.org, we collected data on over 1000 verbs. (We weren't the only ones to notice the problem -- after we began our study, Goikoetxea and colleagues published data from 100 verbs in Spanish and Ferstl and colleagues published data from 305 in English). We found that in fact none of the existing theories worked very well.

However, when we took in independently developed theory of verb meaning from linguistics, that actually predicted the results very well. All of the theories tried to divide up verbs into a few classes. Within each class, it was supposed to be all the verbs with either have causes as their subjects (causing people to interpret the pronoun is referring to the subject in sentences like 1-3). Unfortunately, this was rarely the case, as shown in Table 2 of the paper:

A new theory

This was, of course, disappointing. We wanted to understand pronoun interpretation better, but now we understood worse! Luckily, the work did not end there. We turned to a well-developed theory from linguistics about what verbs mean (the work I have described above was developed by psychologists largely independently from linguistics).

The basic idea behind this theory is that the core meaning of verbs is built out of a few basic parts, such as movement, possession, the application of force, and – importantly for us – causality. In practice, nobody goes through the dictionary and determines for every verb, which of these core components it has. This turns out to be prohibitively difficult to do (but stay tuned; a major new project GamesWithWords.org will be focused on just this). But it turns out that when you classify verbs according to the kinds of sentences they can appear in, this seems to give you the same thing: groups of verbs that share these core components meaning (such as causality).

The prediction, then, is that if we look at verbs in the same class according to this theory, all the verbs in that class should encode causality in the same way and thus should affect pronouns in the same way. And that is exactly what we found. This not only furthers our understanding of the phenomenon we were studying, but it is also confirmation of both the idea that verb meaning plays a central role in the phenomenon and is confirmation of the theory from linguistics.

Why so much work on pronouns?

Pronouns are interesting in their own right, but I am primarily interested in them as a case study in ambiguity. Language is incredibly ambiguous, and most of the time we don't even notice it; For instance, it could be that the "she" in (1) refers to Jennifer -- someone not even mentioned in the sentence! -- but you probably did not even consider that possibility. Because we as humans find the problem so easy, it is very hard for us as scientists to have good intuitions about what is going on. This has become particularly salient as we try to explain to computers what language means (that is, program them to process language).

The nice thing about pronouns is that they are a kind of ambiguity is very easy to study, and many good methods have been worked out for assessing their processing. More than many areas of research on ambiguity -- and, I think, more than many areas of psychology that don't involve vision -- I feel that a well worked-out theory of pronoun processing is increasingly within our reach. And that is very exciting.

------

Hartshorne, J., and Snedeker, J. (2012). Verb argument structure predicts implicit causality: The advantages of finer-grained semantics Language and Cognitive Processes, 1-35 DOI: 10.1080/01690965.2012.689305

Brown, R., and Fish, D. (1983). The psychological causality implicit in language Cognition, 14 (3), 237-273 DOI: 10.1016/0010-0277(83)90006-9

Goikoetxea, E., Pascual, G., and Acha, J. (2008). Normative study of the implicit causality of 100 interpersonal verbs in Spanish Behavior Research Methods, 40 (3), 760-772 DOI: 10.3758/BRM.40.3.760

Ferstl, E., Garnham, A., and Manouilidou, C. (2010). Implicit causality bias in English: a corpus of 300 verbs Behavior Research Methods, 43 (1), 124-135 DOI: 10.3758/s13428-010-0023-2

Rudolph, U., and Forsterling, F. (1997). The psychological causality implicit in verbs: A review. Psychological Bulletin, 121 (2), 192-218 DOI: 10.1037//0033-2909.121.2.192

Findings: Which of my posts do you like best?

Posted by GamesWithWords on Wednesday, January 05, 2011

It will surprise nobody that I like data. By extension, it should surprise nobody that what I like about blogging is getting instant feedback on whether people found a post interesting and relevant or not. This is in contrast to writing a journal article, where you will wait minimally a year or two before anyone starts citing you (if they ever do).

How I feel about data.

Sometimes the results are surprising. I expected my posts on the suspicious data underlying recent graduate school rankings to make a splash, but the two posts together got a grand total of 2 comments and 16 tweets (some of which are automatically generated by FieldofScience). I didn't expect posts on my recent findings regarding pronoun processing to generate that much interest, but they got 6 comments and 26 tweets, putting them among the most popular, at least as far as Twitter is concerned.

To get a sense of which topics you, dear readers, find the most interesting, I compiled the statistics from all my posts from the fall semester and tabulated those data according to the posts' tags. Tags are imperfect, as they reflect only how I decided to categorize the post, but they're a good starting point.

Here are the results, sorted by average number of retweets:

label	#Posts...	Tweets_Ave...	Reddit_Ave...	Comments_Ave...
findings	2	13	0	3
publication	3	13	5	5
peer review	4	12	13	10
universal grammar	5	10	2	8
pronouns	3	10	0	2
GamesWithWords.org	2	9	0	1
scientific methods	7	8	7	7
neuroscience	1	8	0	5
overheard	1	7	0	1
language development	2	7	0	7
Web-based research	6	7	0	1
science and society	3	6	1	6
language	6	6	1	3
education	2	6	0	1
journalism	2	6	18	9
politics	7	6	0	2
science blogging	2	6	1	2
language acquisition	1	5	0	0
recession	2	5	1	3
the future	1	5	0	0
vision	1	5	0	1
graduate school	4	5	0	3
science in the media	3	5	12	7
method maven	2	5	18	10
media	3	4	0	1
psychology career path	1	4	0	2
lab notebook	3	3	0	1
none	4	3	0	0

Since we all know correlation = causation, if I want to make a really popular post, I should label it "findings, publication, peer review". If I want to ensure it is ignored, I shouldn't give it a label at all.

At this point, I'd like to turn it over the crowd. Are these the posts you want to see? If not, what do you want to read more about? Or if you think about your favorite blogs, what topics do you enjoy seeing on those blogs?

Crowdsourcing My Data Analysis

Posted by GamesWithWords on Tuesday, December 21, 2010

I just finished collecting data for a study. Do you want to help analyze it?

Puns

What makes a pun funny? If you said "nothing," then you should probably skip this post. But even admirers of puns recognize that while some are sublime, others are ... well, not.

Over the last year, I've been asking people to rate funniness of just over 2300 different puns. (Where did I get 2300 puns? The user-submitted site PunoftheDay. PunoftheDay also has funniness ratings, but I wanted a bit more control over how the puns were rated and who rating them.).

Why care what makes puns funny?

There are three reasons I ran this experiment. I do mostly basic research, and while I believe in its importance and think it's fun, the idea of doing a project I could actually explain to relatives was appealing. I was partly inspired by Zenzi Griffin's 2009 CUNY talk reporting a study she ran on why parents call their kids by the wrong names (typically, calling younger children by elder children's names), work which has now been published in a book chapter.

Plus, I was just interested. I mean: puns!

Finally, I was beginning a line of work on the interpretation of homophones. One of the best-established facts about homophones is that we very rapidly suppress context-irrelevant meanings of words -- in fact, so rapidly that we rarely even notice. If your friend said, "I'm out of money, so I'm going to stop by the bank," would you really even notice considering that bank might mean the side of a river?

A river bank.

photo: Istvan, creative commons

A successful pun, on the other hand, requires that at least two meanings be accessed and remain active. In some sense, a pun is homophone processing gone bad. By better understanding puns, I thought I might get some insight into language processing.

Puntastic

As already mentioned, my first step down this road was to collect funniness ratings for a whole bunch of puns. I popped them into a Flash survey, called it Puntastic, and put it on the Games With Words website. The idea was to mine the data and try to find patterns which could then be systematically manipulated in subsequent experiments.

It turns out that there are a lot of ways that 2300 puns can be measured and categorized. So while I have a few ideas I want to try out, no doubt many of the best ones have not occurred to me. Data collection was crowdsourced, and I see no reason why the analyses shouldn't be as well.

I have posted the data on my website. If you have some ideas about what might make one pun funnier than another -- or just want to play around with the data -- you are welcome to it. Please post your findings here.

If you are a researcher and might use the data in an official publication, please contact me directly before beginning analysis (gameswithwords$at*gmail.com) just so there aren't misunderstandings down the line. Failure to get permission to publish analyses of these data may be punished by extremely bad karma and/or nasty looks cast your way at conferences.

The results so far...

Unfortunately for the crowd, I've already done the easiest analyses. The following are based on nearly 800 participants over the age of 13 who listed English as both their native and primary languages (there weren't enough non-native English speakers to conduct meaningful analyses on their responses).

The average was 2.6 stars out of 7 (participants could choose anywhere from 1 to 7 stars, as well as "I don't get it," which was scored as -1 for these analyses), which says something either about the puns I used or the people who rated them.

First I looked at differences between participants to see if I could find types of people who like puns more than others. There was no significant difference in overall ratings by men or women.

I also asked participants if they thought they had good or poor social skills. There was no significant difference there, either.

I also asked them in they had difficulty reading or if they had ever been diagnosed with any psychiatric illnesses, but neither of those factors had any significant effect either (got tired of making graphs, so just trust me on this one).

The effect of age was unclear.

It was the case that the youngest participants produced lower ratings than the older participants (p=.0029), which was significant even after a conservative Bonferroni correction for 15 possible pairwise comparisons (alpha=.0033). However, the 10-19 year-olds' ratings were also significantly lower than the 20-29 year-olds' (p=.0014) and the 30-39 year-olds' (p=.0008), but obviously this was not true of the 40-49 year-olds' or 50-59 year-olds' ratings. So it's not clear what to make of that. Given that the overall effect size was small and that this is an exploratory analysis, I wouldn't make much of the effect without corroboration from an independent data set.

The funniest puns

The only factor I've looked at so far that might explain pun funniness is the length of the joke. I considered only the 2238 puns for which I had at least 5 ratings (which was most of them). I asked whether there might be a relationship between the length of the pun and how funny it was. I could imagine this going either way, with concise jokes being favored (short and sweet) or long jokes having a better lead-up (the shaggy dog effect). In fact, the correlation between pun ratings and length in terms of number of characters (r=.05) or in terms of number of words (r=.05) were both so small I didn't bother to do significance tests.

I broke up the puns into five groups according to length to see if maybe there was a bimodal effect (shortest and longest jokes are funniest) or a Goldilocks effect (average-length jokes are best). There wasn't.

In short, I can't tell you anything about what makes some people like puns more than others, or why people like some puns more than others. What I can tell you is which puns people did or didn't like. Here are the top 5 and bottom 5 puns:

1. He didn't tell his mother that he ate some glue. His lips were sealed.
2. Cartoonist found dead in home. Details are sketchy.
3. Biologists have recently produced immortal frogs by removing their vocal cords. They can't croak.
4. The frustrated cannibal threw up his hands.
5. Can Napoleon return to his place of birth? Of Corsican.
...
2234. The Egyptian cinema usherette sold religious icons in the daytime. Sometimes she got confused and called out, 'Get your choc isis here!'
2235. Polly the senator's parrot swallowed a watch.
2236. Two pilgrims were left behind after their diagnostic test came back positive.
2237. In a baseball season, a pitcher is worth a thousands blurs.
2238. He said, "Hones', that is the truth', but I knew elide.

Ten points to anyone who can even figure out what those five puns are about. Mostly participants rated this as "I don't get it."

----------------------
BTW Please don't take from this discussion that there hasn't been any serious studies of puns. There have been a number, going back at least as far as Sapir of the Sapir-Whorf hypothesis, who wrote a paper on "Two Navaho Puns." There is a well-known linguistics paper by Zwicky & Zwicky and at least one computer model that generates its own puns. However, I know a lot less about this literature than I would like to, so if there are any experts in the audience, please feel free to send me links.

Does Global Warming Exist, and Other Questions We Want Answered

Posted by GamesWithWords on Friday, October 29, 2010

This week, I asked 101 people on Amazon Mechanical Turk both whether global temperatures have been increasing due to human activity AND what percentage of other people on Amazon Mechanical Turk would say yes to the first question. 78% agree with the answer to the first question. Here's the answers to the second, broken down by whether the respondent did or did not believe in man-made global warming:

Question: How many other people on Amazon Mechanical Turk believe global temperatures have been increasing due to human activity?

Average 1st Quartile-3rd Quartile

Believers 72% 60%-84%

Denialists 58% 50%-74%

Correct 78% ------

Notice that those who believe global warming is caused by human activity are much better at estimating how many other people will agree than are those who do not. Interestingly, the denialists' answer is much closer to the average of all Americans, rather than Turkers (who are mostly but not exclusively American, and are certainly a non-random sample).

So what?

Why should we care? More importantly, why did I do this experiment? A major problem in science/life/everything is that people disagree about the answers to questions, and we have to decide who to believe. A common-sense strategy is to go with whatever the majority of experts says. There are two problems, though: first, it's not always easy to identify an expert, and second, the majority of experts can be wrong.

For instance, you might ask a group of Americans what the capital of Illinois or New York is. Although in theory, Americans should be experts in such matters (it's usually part of the high school curriculum), in fact the majority answer in both cases is likely to be incorrect (Chicago and New York City, rather than Springfield and Albany). This was even true in a recent study of, for instance, MIT or Princeton undergraduates, who in theory are smart and well-educated.

Which of these guys should you believe?

So how should we decide which experts to listen to, if we can't just go with "majority rules"? A long chain of research suggests an option: ask each of the experts to predict what the other experts would say. It turns out that the people who are best at estimating what other people's answers will be are also most likely to be correct. (I'd love to cite papers here, but the introduction here is coming from a talk I attended earlier in the week, and I don't have the the citations in my notes.) In essence, this is an old trick: ask people two questions, one of which you know the answer to and one of which you don't. Then trust the answers on the second question that come from the people who got the first question right.

This method has been tested on a number of questions and works well. It was actually tested on the state capital problem described above, and it does much better than a simple "majority rules" approach. The speaker at the talk I went to argued that this is because people who are better able to estimate the average answer simply know more and are thus more reliable. Another way of looking at it though (which the speaker mentioned) is that someone who thinks Chicago is the capital of Illinois likely isn't considering any other possibilities, so when asked what other people will say guesses "Chicago." The person who knows that in fact Springfield is the capital probably nonetheless knows that many people will be tricked by the fact that Chicago is the best-known city in Illinois and thus will correctly guess lots of people will say Chicago but that some people will also say Springfield.

Harder Questions

I wondered, then, how well it would work on for a question where everybody knows that there are two possible answers. So I surveyed Turkers about Global Warming. Believers were much better at estimating how many believers there are on Turk than were denialists.

Obviously, there are a few ways of interpreting this. Perhaps denialists underestimate both the proportion of climate scientists who believe in global warming (~100%) and the percentage of normal people who believe in global warming, and thus they think the evidence is weaker than it is. Alternatively, denialists don't believe in global warming and thus have trouble accepting that other people do and thus lower their estimates. The latter proposal, though, would suggest that believers should over-estimate the percentage of people who believe in global warming, though that is not in fact the case.

Will this method work in general? In some cases, it won't. If you asked expert physicists in 1530 about quantum mechanics, presumably none of them would believe it and all would correctly predict that none of the other would believe it. In other cases, it's irrelevant (near 100% of climatologists believe in man-made global warming, and I expect they all know that they all believe in it). More importantly, the method may work well for some types of questions and not others. I heard in this talk that researchers have started using the method to predict product sales and outcomes of sports matches, and it actually does quite well. I haven't seen any of the data yet, though.

------

For more posts on science and politics, click here and here.

Field of Science