Field of Science

Results (Round 1): Crowdsourcing the Structure of Meaning & Thought

Language is a device for moving a thought from one person's head into another's. This means to have any real understanding of language, we also need to understand thought. This is what makes work on language exciting. It is also what makes it hard.

With the help of over 1,500 Citizen Scientists working through our VerbCorner project, we have been making rapid progress.

Grammar, Meaning, & Thought

You can say Albert hit the vase and Albert hit at the vase. You can say Albert broke the vase but you can't say Albert broke at the vase. You can say Albert sent a book to the boarder [a person staying at a guest house] or Albert sent a book to the border [the line between two countries], but while you can say Albert sent the boarder a book, you can't say Albert sent the border a book. And while you say Albert frightened Beatrice -- where Beatrice, the person experiencing the emotion, is the object of the verb -- you must say Beatrice feared Albert -- where Beatrice, the person experiencing the emotion, is now the subject.

How do you know which verb gets used which way? One possibility is that it is random, and this is just one of those things you must learn about your language, just like you have to learn that the animal in the picture on the left is called a "dog" and not a "perro", "xiaogou," or "sobaka." This might explain why it's hard to learn language -- so hard that non-human animals and machines can't do it. In fact, it results in a learning problem so difficult that many researchers believe it would be impossible, even for humans (see especially work on Baker's Paradox).

Many researchers have suspected that there are patterns in terms of which verbs can get used in which ways, explaining the structure of language and how language learning is possible, as well as shedding light on the structure of thought itself. For instance, the difference (it is argued) between Albert hit the vase and Albert hit at the vase is that the latter sentence means that Albert hit the vase ineffectively. You can't say Albert broke at the vase because you can't ineffectively break something: It is either broken or not. The reason you can't say Albert sent the border a book is that this construction means that the border owns the book, which a border can't do -- borders aren't people and can't own anything -- but a boarder can. The difference between Albert frightened Beatrice and Beatrice feared Albert is that the former describes an event that happened in a particular time and place (compare Albert frightened Beatrice yesterday in the kitchen with Beatrice feared Albert yesterday in the kitchen).

When researchers look at the aspects of meaning that matter for grammar across different languages, many of the same aspects pop up over and over again. Does the verb describe something changing (break vs. hit)? Does it describe something only people can do (own, know, believe vs. exist, break, roll)? Does it describe an event or a state (frighten vs. fear)? This is too suspicious of a pattern to be accidental. Researchers like Steven Pinker have argued that language cares about these aspects of meaning because these are basic distinctions our brain makes when we think and reason about the world (see Stuff of Thought). Thus, the structure of language gives us insight into the structure of thought.

The Question

The theory is very compelling and is exciting if true, but there are good reasons to be skeptical. The biggest one is that there simply isn't that much evidence one way or another. Although a few grammatical constructions have been studied in detail (in recent years, this work has been spearheaded by Ben Ambridge of the University of Liverpool), the vast majority have not been systematically studied, even in English. Although evidence so far suggests that which verbs go in which grammatical constructions is driven primarily or entirely by meaning, skeptics have argued that is because researchers so far have focused on exactly those parts of language that are systematic, and that if we looked at the whole picture, we would see that things are not so neat and tidy.

The problem is that no single researcher -- nor even an entire laboratory -- can possibly investigate the whole picture. Checking every verb in every grammatical construction (e.g., noun verb noun vs. noun verb at noun, etc.) for every aspect of meaning would take one person the rest of her life.

CrowdSourcing the Answer

Last May, VerbCorner was launched to solve this problem. For the first round of the project, we posted questions about 641 verbs and six different aspects of meaning. By October 18th, 1,513 volunteers had provided 117,584 judgments, which works out to 3-4 people per sentence per aspect of meaning. That was enough data to start analyzing.

As predicted, there is a great deal of systematicity in the relationship between meaning and grammar (for details on the analysis, see the next section). These results suggest that the relationship between grammar and meaning may indeed be very systematic, helping to explain how language is learnable at all. It also gives us some confidence in the broad project of using language as a window into how the brain thinks and reasons about the world. This is important, because the mind is not easy to study, and if we can leverage what we know about language, we will have learned a great deal. As we test more verbs and more aspects of meaning -- I recently added an additional aspect of meaning and several hundred new verbs -- that window will be come clearer and clearer.

Unless, of course, it turns out that not all of language is so systematic. While our data so far represent a significant proportion of all research to date, it's only a tiny fraction of English. That is what makes research on language so hard: there is so much of it, and it is incredibly complex. But with the support of our volunteer Citizen Scientists, I am confident that we will be able to finish the project and launch a new phase of the study of language.

That brings up one additional aspect of the results: It shows that this project is possible. Citizen Science is rare in the study of the mind, and many of my colleagues doubted that amateurs could provide reliable results. In fact, by the standard measures of reliability, the information our volunteers contributed is very reliable.

Of course, checking for a systematic relationship between grammar and meaning is only the first step. We'd also liked to understanding which verbs and grammatical constructions have which aspects of meaning and why, and leverage this knowledge into understanding more about the nature of thought. Right now, we still don't have enough data to have exciting new conclusions (for exciting old conclusions, see Pinker's Stuff of Thought). I expect I'll have more to say about that after we complete the next phase of data collection.

Details of the Analysis

Here is how we did the analyses. If meaning determines which grammatical constructions a given verb can appear in, then you would expect that all the verbs that appear in the same set of frames should be the same in terms of the core aspects of meaning discussed above. So if one of those verbs describes, for instance, physical contact, then all of them should.

Helpfully, the VerbNet project -- which was built on earlier work by Beth Levin -- has already classified over 6,000 English verbs according to which grammatical constructions they can appear in. The 641 verbs posted in the first round of the VerbCorner project consisted of all the verbs from 11 of these classes.

So is it the case that in a given class, all the verbs describe physical contact or all of them do not? One additional complication is that, as I described above, the grammatical construction itself can change the meaning. So what I did was count what percentage of verbs from the same class have the same value for a given aspect of meaning for each grammatical construction, and then I averaged over those constructions.

The "Explode on Contact" task in VerbCorner asked people to determine whether a given sentence (e.g., Albert hugged Beatrice) described contact between different people or things. Were the results for a given verb class and a given grammatical construction? Several volunteers checked each sentence. If there was disagreement among the volunteers, I used whatever answer the majority had chosen.

This graph shows the degree of consistency by verb class (the classes are numbered according to their VerbNet number), with 100% being maximum consistency. You can see that all eleven classes are very close to 100%. Obviously, exactly 100% would be more impressive, but that's extremely rare to see when working with human judgments, simply because people make mistakes. We addressed this in part by having several people check each sentence, but there are so many sentences (around 5,000), that simply by bad luck sometimes several people will all make a mistake on the same sentence. So this graph looks as close to 100% as one could reasonably expect. As we get more data, it should get clearer.

Results were similar for other tasks. Another one looked at whether the sentence described someone applying force (pushing, shoving, etc.) to something or someone else:
Maybe everything just looks very consistent? We actually had a check for that. One of the tasks measures whether the sentence describes something that is good, bad, or neither. These is no evidence that this aspect of meaning matters for grammar (again, the hypothesis is not that every aspect of meaning matters -- only certain ones that are particularly important for structuring thought are expected to matter). And, indeed, we see much less consistency:
Notice that there is still some consistency, however. This seems to be mostly because most sentences describe something that is neither good nor bad, so there is a fair amount of essentially accidental consistency within each verb class. Nonetheless, this is far less consistency that what we saw for the other five aspects of meaning studied.

Citizen Science in Harvard Magazine

A nice, extended article on recent projects, covering a wide range -- including Check it out.

Science Mag studies science. Forgets to include control group.

Today's issue of Science carries the most meta sting operation I have ever seen. John Bohannon reports a study of open access journals, showing lax peer review standards. He sent 304 fake articles with obvious flaws to 304 open access journals, more than half of which were accepted.

The article is written as a stinging rebuke of open access journals. Here's the interesting thing: There's no comparison to traditional journals. For all we know, open access journals actually have *stricter* peer review standards than traditional journals. We all suspect not, but suspicion isn't supposed to count as evidence in science. Or in Science.

So this is where it gets meta: Science -- which is not open access -- published an obviously flawed article about open access journals publishing obviously flawed articles.

It would be even better if Bohannon's article had run in the "science" section of Science, rather than in the news section, where it actually ran, but hopefully we can agree that Science can't absolve itself of checking its articles for factualness and logical coherence just by labeling them "news".


I have never been good at coming up with titles for articles. When writing for newspapers or magazines, I usually leave it up to the editor. There is some danger that comes with this, however.

Last week, I wrote a piece for Scientific American about similarities across languages. This piece was then picked up by Salon, which re-ran the article under a new title:
Chomsky's "Universal Language" is incomplete. Chomsky's theory does not adequately explain why different languages are so similar.
I agree that this is snappier than any title I would have come up with. It's also perhaps a bit snappier than the one Scientific American used. It's also dead wrong. For one, there is no such thing as Chomsky's "Universal Language." Or if there is, presumably it is love. Or maybe mathematics. Or maybe music. The term is "Universal Grammar."

If you squint, the subtitle isn't exactly wrong. In the article, I do claim that standard Universal Grammar theory's explanation of similarities across languages isn't quite right. But the title implies that UG suggests that languages are not that similar, whereas the real problem with UG is that -- at least on standard interpretations -- it suggests that languages should be more similar than they actually are.

I sent in a letter to "corrections" at Salon, and the title has now been switched to something more correct. The moral of the story? Apparently writing good titles really is just very hard.

GamesWithWords on Scientific American

Over the last week, has published two articles by me. The most recent, "Citizen Scientists decode meaning, memory and laughter," discusses how citizen science projects -- science projects involving collaborations between professional scientists and amateur volunteers -- are now being used to answer questions about the human mind.

Citizen Science – projects which involve collaboration between professional scientists and teams of enthusiastic amateurs — is big these days. It’s been great for layfolk interested in science, who can now not just read about science but participate in it. It has been great for scientists, with numerous mega-successes like Zooniverse and Foldit. Citizen Science has also been a boon for science writing, since readers can literally engage with the story.
However, the Citizen Science bonanza has not contributed to all scientific disciplines equally, with many projects in zoology and astronomy but less in physics and the science of the mind. It is maybe no surprise that there have been few Citizen Science projects in particle physics (not many people have accelerators in their back yards!), but the fact that there has been very little Citizen Science of the mind is perhaps more remarkable.

The article goes on to discuss three new mind-related citizen science projects, including our own VerbCorner project.

The second, "How to understand the deep structures of language," describes some really exciting work on how to explain linguistic universals -- work that was conducted by colleagues of mine at MIT.
In an exciting recent paper, Ted Gibson and colleagues provide evidence for a design-constraint explanation of a well-known bias involving case endings and word order. Case-markers are special affixes stuck onto nouns that specify whether the noun is the subject or object (etc.) of the verb. In English, you can see this on pronouns (compare "she talked with her"), but otherwise, English, like most SVO languages (languages where the typical word order is Subject, Verb, Object) does not mark case. In contrast, Japanese, like most SOV languages (languages where the typical word order is Subject, Object, Verb) does mark case, with -wa added to subjects and -o added to direct objects. "Yasu saw the bird" is translated as "Yasu-wa tori-o mita" and "The bird saw Yasu" is translated as "Tori-wa Yasu-o mita." The question is why there is this relationship between case-marking and SOV word order.
The article ran in the Mind Matters column, which invites scientists to write about the paper that came out in the last year that they are most excited about. It was very easy for me to choose this one.

Language and Memory Redux

One week only: If you did not do our Language and Memory task when it was running earlier this year, now is your chance. We just re-launched it to collect some additional data.

I expect we'll have enough data without a week to finish this line of studies, rewrite the paper (this is a follow-up experiment that was requested by peer reviewers), and also post the full results here.

Вы понимаете по-русски?

У нас новый русский эксперимент. Большинство психолингвистов занимаются английским. Мы хотим узнать больше об остальних. Не волнуйтесь -- я не сам перевёл эксперимент. Перевела его настоящая рускоязычная!

If you didn't understand that, that's fine. We're recruiting participants for a new experiment in Russian. Apparently you aren't eligible. :)

Much of the research on language is done on a single language: English. In part, that's because many researchers happen to live in English-speaking countries. The great thing about the Internet is we are freed from the tyranny of geography.

One week left to vote

There is less than a week left to vote for our panel at SXSW -- or to leave comments (apparently comments are weighted more heavily than mere votes). 

There is less than a week left to vote for our panel at SXSW -- or to leave comments (apparently comments are weighted more heavily that mere votes). So if you want to support our work in improving psychology and the study of the mind & language, please go vote.

Go to this link to create an SXSW account:
Then go to this link and click on the thumb’s up (on the left under “Cast Your Vote”) to vote for us:
You can read more about our proposal at the SXSW site, as well as here.

Who knows more words? Americans, Canadians, the British, or Australians?

I have been hard at work on preliminary analyses of data from the the Vocab Quiz, which is a difficult 32 word vocabulary test. Over 2,000 people from around the world have participated so far, so I was curious to see which of the English-speaking nationalities was doing best.

Since the test was made by an American (me), you might expect Americans to do best (maybe I chose words or definitions of words that are less familiar to those in other countries). Instead, Americans (78.4% correct) are near the bottom of the heap, behind the British (79.8%), New Zealanders (82.2%), the Irish (80.1%), South Africans (83.9%), and Australians (78.6% -- OK that one is close). At least we're beating the Canadians (77.4%).

A fluke?

Maybe that was just bad luck. Plus, some of those samples are small -- there are fewer than 10 folks from New Zealand so far. So I pulled down data from the Mind Reading Quotient, which also includes a (different) vocabulary test. Since the Mind Reading Quotient has been running longer, there are more participants (around 3,000). The situation was no better: This time, we weren't even beating the Canadians. 

Maybe this poor showing was due to immigrants in America who don't know English well? Sorry -- the above results only include people whose native language is English. 

I also considered the possibility  that maybe Americans are performing poorly because I designed the tests to be hard, inadvertently including worse that are rare in America but common elsewhere. But the consistency of results across other countries makes that seem unlikely: What do the British, New Zealanders, Irish, South Africans and Australians all know that we don't? This hypothesis suggests that the poor showing by Americans is due to one or two items in particular. Right now there isn't enough data to do item-by-item analyses, but once we have more. Which brings me to...

Data collection continues

If you want to check how good your vocabulary is compared to everyone else who has taken the test -- and if you haven't done so already -- you can take the Vocab Quiz here. At the Mind Reading Quotient, you can test your ability to understand other people -- to read between the lines.


Phytophactor asks whether these results are significant. In the MRQ data, all the comparisons are significant, with the exception of US v. Canada (which went the other direction in the Vocab Quiz data anyway). The comparison with Australia is a trend (p=.06). See comments below for additional details. I did not run the stats for Vocab Quiz.

Children don't always learn what you want

Someone has not been watching his/her speech around this little girl.

It's clear she has some sense as to what the phrase means, but clearly she's got the words wrong. But she is treating this phrase as compositional (notice how she switches between "his" and "my").

One of my younger brothers went around for a couple months saying "ship" whenever anything bad happened. But unfortunately we don't have that on video.

Taking research out into the wild

Like others, we believe that science is a little bit WEIRD — much of research is based on a certain type of person, from a very specific social, cultural, and economic background (WEIRD stands for Western Educated Industrialized Rich Democratic; Henrich, Heine, Norenzayan, 2010).  We want to use the web and the help of citizen scientists to start changing that.  In the next few months, we will be launching an initiative called Making Science Less Weird (stay tuned).
As part of Making Science Less Weird, we have proposed a panel presentation at the SXSW conference next year.  Here, "we" includes the team at but also at and
In order to be selected, however, *we need votes*. To support Making Science Less Weird and help us increase diversity in human research, please go to this link to create an SXSW account:
Then go to this link and click on the thumb’s up (on the left under “Cast Your Vote”) to vote for us:
Thanks for your support!

What makes interdisciplinary work difficult

I just read "When physicists do linguistics." Yes, I'm late to the party. In my defense, it only just appeared in my twitter feed. This article by Ben Zimmer describes work published earlier this year, in which a group of physicists applied the mathematics of gas expansion to vocabulary change. This paper was not well received. Among the experts discussed, Josef Fruehwald, a University of Pennsylvania graduate student, compares the physicists to Intro to Linguistics students (not favorably).

Part of the problem is that the physicists seem to have not understood the dataset they were working with and were in any case confused about what a word is, which is a problem if you are studying words! Influential linguist Mark Liberman wrote "The paper's quantitative results clearly will not hold for anything that a linguist, lexicographer, or psychologist would want to call 'words.'"

Zimmer concludes that
Tensions over [the paper] may really boil down to something simple: The need for better communication between disciplines that previously had little to do with each other. As new data models allow mathematicians and physicists to make their own contributions about language, scientific journals need to make sure that their work is on a firm footing by involving linguists in the review process. That way, culturomics can benefit from an older kind of scholarship -- namely, what linguists already know about humans shape words and words shape humans.
Beyond pointing out that linguists and other non-physicists don't already apply sophisticated mathematical models to language -- there are several entire fields that already do this work, such as computational linguistics and natural language processing -- I respectfully suggest that involving linguists at the review process is way too late. If the goal is to improve the quality of the science, bringing in linguists to point out that a project is wrong-headed after the project is already completed doesn't really do anyone much good. I guess it's good not to publish something that is wrong, but it would be even better to publish something that is right. For that, you need to make sure you are doing the right project to begin with.

This brings me to the difficulty with interdisciplinary research. The typical newly-minted professor -- that is, someone just starting to do research on his/her own without regular guidance from a mentor/advisor -- has studied that field for several years as an undergraduate, 5+ years as a graduate student, and several more years as a post-doc. In fact, in some fields even newly-minted professors aren't considered ready to release into the wild and are still working with a mentor. What this tells me is that it takes as much as 10 years of training and guidance before you are ready to be fully on your own. (This will vary somewhat across disciplines.)

Now maybe someone who has already mastered one scientific field can master the second one more quickly. I'm frankly not sure that's true, but it is an empirical question. But it seems very unlikely that anyone, no matter how smart nor how well trained in their first field, is ready to tackle big questions in a new field without at least a few years of training and guidance from an experienced researcher in that field.

This is not a happy conclusion. I'm getting a taste of this now, as I cross-train in computational modeling (my background is pure experimental). It is not fun to go from being regarded as an expert in your field to suddenly being the least knowledgeable person in your laboratory. (After a year of training, it's possible I'm finally a more competent computational modeler than at least the incoming graduate students, though it's a tough call -- they, at least, typically have several years of relevant undergraduate coursework.) And I'm not even moving disciplines, just sub-disciplines within cognitive science!

So it's not surprising that some choose the "shortcut" of reading a few papers, diving in, and hoping for the best, especially since the demands of the career mean that nobody really has time to take a few years off to learn a new discipline. But it's not clear that this is a particularly effective strategy. All the best interdisciplinary work I have seen -- or been involved in -- involved an interdisciplinary team of researchers. This makes sense. It's hard enough to be an expert in one field. Why try to be an expert in two fields when you could just collaborate with someone who has already done the hard work of becoming an expert in that discipline? Just sayin'.

VerbCorner (and others) on SciStarter.Com

There is a brief profile of our crowd-sourcing project VerbCorner on, with a number of quotes form yours truly.

SciStarter profiles a lot of Citizen Science / Crowd-sourced Science projects. Interestingly, most are physical sciences, with only one project listed under psychology (interestingly, also a language project).

This is not a feature of SciStarter but more a feature of Citizen Science. The Scientific American database only lists two projects under "mind and brain" -- and I'm pretty sure they didn't even have that category last time I checked. This is interesting, because psychologists have been using the Internet to do research for a very long time -- probably longer than anyone else. But we've been very late to the Citizen Science party.

Not, of course, that you shouldn't want to participant in non-cognitive science projects. There are a bunch of great ones. I've personally mostly only done the ones at Zooniverse, but SciStarter lists hundreds.

Peaky performance

Right now there is a giant spike of traffic to, following Steve Pinker's latest tweet about one of the experiments (The Verb Quiz). I looked back over the five years since I started using Google Analytics, and you can see that in general traffic to the site is incredibly peaky.
The three largest single-day peaks account for over 10% of all the visitors to the site over that time period.

Moral of the story: I need Pinker to tweet my site every day!

Findings: at DETEC2013

I recently returned from the inaugural Discourse Expectations: Theoretical, Experimental, and Computational Perspectives workshop, where I presented a talk ("Three myths about implicit causality") which ties together a lot of the pronoun research that I have been doing over the last few years, including results from several experiments (PronounSleuth, That Kind of Person, and Find the Dax).

VerbCorner: New and improved, with surprise bonuses

After a month-long tour, VerbCorner returned to the garage for some fine-tuning. There are now bonus points in each task, doled out whenever ... well, play to find out!

The other major change is that you no longer have to log in to participate. This way, people can check VerbCorner out before committing to filling out the registration form. (Though please do register).

We also made a number of other tweaks here and there to make the site easier to use.

Keeping up to date

Recently, we've added several methods of keeping up to date on projects (finding out when results of old studies are available, when new studies are posted, etc.). In addition to following this blog, that is.

1. Join the Google Group for occasional (5x/year) email updates.

2. Follow @gameswithwords on Twitter.

3. Like our Facebook page.

Citizen Science: Rinse & Repeat

One of the funny things about language is that everybody has their own. There is no "English" out there, existing independently of all its speakers. Instead, there are about one billion people out there, all of whom speak their own idiolect. Most likely, no two people share exactly the same vocabulary (I know some words you might not, possibly including idiolect, and you know some words I don't). Reasonable people can disagree about grammar rules, particularly if one is from Florida and the other from Northern Ireland.

This is one of the reasons we decided to ask people to create usernames in order to contribute to VerbCorner. Suppose two people answer the same question on VerbCorner but disagree. One possibility is that one of them made a mistake (which happens!). But another possibility is that they actually speak different dialects of English, and both are correct (for their dialect). It's hard to tell these possibilities apart by looking at just one question, but by looking at their answers to a set of questions, we can start to get a handle on whether this was a mistake or a real disagreement. The more answers we get from the same person -- particularly across different tasks -- the easier it is to do these analyses.

If we didn't have usernames, it would be hard to figure out which answers all belong to the same person. This is particularly true if the same person comes back to the website from time to time.

People are coming back. At last check, we have ten folks who have answered over 500 questions and four who have answered over 1000. (You can see this by clicking "more" on the leader-board on the main page).

Still, it would be great if we had even more folks who have answered large numbers of questions. Our goal is to have everyone in the top 20 to have answered at least 500 questions by the end of the month.

What makes a sentence ungrammatical?

This is the latest in a series of posts explaining the scientific motivations for the VerbCorner project.

There are many sentences that are grammatical but don't make much sense, including Chomsky's famous “colorless green ideas sleep furiously,” and sentences which seemed perfectly interpretable but are grammatical, such as “John fell the vase” or “Sally laughed Mary” (where the first sentence means that John caused the vase to fall, and the second sentence means that Sally made Mary laugh). You can hit at a window or kick at a window but not shatter at a window or break at a window (unless you are the one shattering or breaking!).

Sentence frames

Notice that these are not agreement errors (“Sally laughed”) or other word-ending errors ("Sally runned to the store"), but instead have something to do with the structure of the sentence as a whole. Linguists often refer to these sentence structures as "frames". There is the transitive frame (NOUN VERB NOUN), the intransitive frame (NOUN VERB), the 'at' frame (NOUN VERB at NOUN), etc. And it seems that certain verbs can go in some frames but not others.

There are many sentence frames (there is disagreement about exactly how to count them, but there are at least a few dozen), and most verbs can appear in somewhere around a half dozen of them. For instance, "thump" can appear in at least eight frames:

NOUN VERB NOUN:                                                  John thumped the door.
NOUN VERB NOUN with NOUN:                             John thumped the door with a stick.
NOUN VERB NOUNs together:                                   John thumped the sticks together.
NOUN VERB NOUN ADJECTIVE:                           John thumped the door open.
NOUN VERB NOUN ADJECTIVE with NOUN:       John thumped the door open with a stick.
NOUN VERB NOUN to [STATE]:                              John thumped the door to pieces.
NOUN VERB NOUN to [STATE] with NOUN:         John thumped the door to pieces with a stick.
NOUN VERB NOUN against NOUN:                         John thumped the stick against the door.

But there are a large number of frames "thump" can't appear in (at least, not without a lot of straining), such as:

NOUN VERB NOUN that SENTENCE:                    John thumped that Mary was angry.
NOUN VERB NOUN NOUN:                                    John thumped Mary the book.
NOUN VERB easily:                                                   Books thump easily.
There VERB NOUN out of [LOCATION]:               There thumped John out of the house.
NOUN VERB what INFINITIVE:                             John thumped what to do.
NOUN VERB INFINITIVE:                                      John thumped to sing

Explaining language

Perhaps these are just funny facts that we must learn about the language we speak, with no rhyme or reason. This is probably true for some aspects of grammar, like which verbs are irregular (that the past tense of “sleep” is “slept” is a historical accident). But a lot of researchers have suspected that there is a reason why language is the way it is and why certain verbs can go into certain frames but not others.

Going back several decades, researchers noticed that when you sort sentences based on the kind of sentence frames they can fit into, you do not get incoherent jumbles of verbs, but rather groups of verbs that all seem to share something in common. So “shatter” and “break” can be used with the object that is shattering or breaking as the direct object ("John shattered/broke the vase") or as the subject ("The vase shattered/broke"). All the verbs that can do this seem to describe some caused change of state (the vase is changing). Verbs that do not describe some kind of caused change cannot appear in both of these forms (you can say “John hit/kicked the vase" but not "The vase hit/kicked" -- at least not without a very special vase!).

Causality might also explain why you can hit at a window or kick at a window but not shatter or break at a window: the addition of the preposition "at" suggests that the action was ineffectual (you tried hitting the window without doing much damage) which is simply nonsensical with words that by their very definition require success. How do you ineffectually shatter a window? You either shatter it or you don't.

So maybe which verbs can go in which frames is not so mysterious after all. Maybe it is a simple function of meaning. Certain verbs have the right meanings for certain sentence frames. No more explanation necessary.

The VerbCorner Contribution

When you group verbs based on the frames they can appear in, you get several hundred groups of verbs in English. Of these, only a handful have been studied in any detail. While it does look like those groups can be explained in terms of their meaning, you might wonder if perhaps these are unusual cases, and if researchers looked at the rest, we would find something different. In fact, a number of researchers have wondered just that.

The difficulty has always been that there are a lot of verbs and a lot of groups. Studying just one group can take a research team years. Studying all of them would take lifetimes.

This is why we decided to crowd-source the problem. Rather than have a few people spend a lifetime, if a lots of people each contribute just a little, we can finish the project in a couple years, if not sooner.

Contribute to the VerbCorner project at

Bad Evolutionary Arguments

The introductory psychology course I teach for is very heavy on evolutionary psychology. The danger with evolutionary explanations is that it's pretty easy to come up with bad ones. Here's the best illustration I've seen, from Saturday Morning Breakfast Cereal:

How do you tell a good evolutionary argument from a bad one? It's hard to test them with experiments, but that doesn't mean you can't get data. Nice supporting evidence would be finding another species that does the same thing. This hypothesis makes the clear -- and almost certainly false -- prediction that people are likely to adopt babies that fly in out of the blue. You would want to show that whatever reproductive advantage comes from having your genes spread widely (adopted children themselves have more children?) is not overwhelmed by the disadvantages of not being raised by your biological parents (there is data showing that, all else equal, step-parents invest lest in step-children than biological parents invest in their biological children. I expect this generalizes to adoptive parents, but I'm not sure; it might be confounded in modern day by the rigorous screening of adoptive parents).

Etc. We try to teach our students to critically evaluate evolutionary hypotheses. Hopefully it has taken.

Citizen Science Project: Likely Events

VerbCorner was our first step towards opening up the rest of the process. I have just opened up a new good to segment of the website called “Experiment Creator”, which is our second endeavor.

Experiment Creator

One of the most important parts of language experiments is choosing the stimuli. For many types of research, such as in many low level or mid-level vision projects, the experimenter has free reign to design essentially what ever stimuli they like. Language researchers are constrained by the fact that someone suggest other words don't, and each word that has the properties you want may also have other properties that you don't want along for the ride. For instance, you might want to compare nouns and verbs, which don't just differ in terms of part of speech but also frequency (there are many very low-frequency nouns) and length (in some languages, nouns will be systematically longer than verbs; in other languages, it will be the reverse).

Typically, we have to run one or more “norming” experiments to choose stimuli that are controlled for various nuisance factors. These are not really experiments. There is no hypothesis. The purpose of the experiment is indirect (it's an experiment to create another experiment). So I usually do not post them at, which recruits people who want to participate in experiments.

The new Experiment Creator project changes this. The tasks posted there will be meta-experiments, used to choose stimuli for other experiments. I just posted the first one, Likely Events.

Likely Events

One of the big discoveries about language in the last few decades is that when we are listening to someone talk or reading a passage, we are actively predicting what will come next. If you hear “John needed money, so he went to the…” you probably expect the next word to be “ATM," not “hibernate.” There are two reasons: 1) "the" is usually followed by a noun, not a verb, and 2) "hibernate" is a relatively rare word.

Much of this research has focused on word frequency and what words follow what other words. We are developing several projects to look more carefully at predictions based not on what word follows what word but on what event is likely to follow what event. In general, "the street" is a more common sequence of words than "the ATM" and "street" is more common than "ATM", but you probably didn't think that the example sentence above was likely to end with "street" for a simple reason: That's not (usually) where you go when you need money.

To do this research, we need to have sequences of events and vary how likely it is that the one event would follow the other, as well as how likely each event is to happen on its own. And we need many, many such sequences. If you would like to help us out, you can do so here.

On the theory that the people interested in these projects will be more committed, Likely Events takes a bit longer than our typical project (in order to make up for the smaller number of volunteers). I expect participation will take on the order of half an hour. We will see how this goes and how many people are interested. Feedback is welcome.

VerbCorner: A Citizen Science project to find out what verbs mean

Earlier this week, I blogged about our new VerbCorner project. At the end, I promised that there would be more info forthcoming about why we are doing this project, about its aims and expected outcomes, why it's necessary, etc. Here's the first installment in that series.

Computers and language

I just dictated the following note to Siri
Many of our best computer systems treat words as essentially meaningless symbols that need to be moved around.
Here's what she wrote
Many of our best computer system street words is essentially meaningless symbols that need to be moved around.
I rest my case.

The problem of meaning.

I don't know for sure how Siri works, but her mistake is emblematic of how much language software works. Computer systems treat and Computer system street sound approximately the same, but that's not something most humans would notice because the first interpretation makes sense and the second one doesn't. 

Decades of research shows that human language comprehension is heavily guided by plausibility: when there are two possible interpretations of what you just heard, go for the one that makes sense. This happens speech recognition like in the example above, and it plays a key role in understanding ambiguous words. If you want to throw Google Translate for a look, give it the following:
John was already in his swimsuit as we reached the watering hole. "I hope the tire swing is still there," John said as he headed to the bank.
Although the most plausible interpretation of bank here is side of a river, Google Translate will translate it into the word for "financial institution" in whatever language you are translating into, because that's the most common meaning of the English work bank.

So what's the problem?

I assume that this limitation is not lost on the people at Google or at Apple. And, in fact, there are computer systems that try to incorporate meaning. The problem there is not so much the computer science as the linguistic science.** Dictionaries notwithstanding, scientists really do not know very much about what words mean, and it is hard to program the computer to know what the word means when you actually do not know.

(Dictionaries are useful, but as an exercise, pick* definition from a dictionary and come up with a counterexample. It is not hard.)

One of the limitations is scope. Language is huge. There are a lot of words. So scientists will work on the meanings of a small number of words. This is helpful, but a computer that only knows a few words is pretty limited. We want to know the meanings of all words.

Solving the problem

We've launched a new section of the website, VerbCorner. There, you can answer questions about what verbs mean. Rather than try to work out the meaning of a word all at once, we have broken up the problem in a series of different questions, each of which tries to pinpoint a specific component of meaning. Of course, there are many nuances to meaning, but research has shown that certain aspects are more important that others, and we will be focusing on those.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

*Dragon Dictate originally transcribed this as "pickled", which I did not catch on proofreading. More evidence that we need computer programs that understand what words mean.
**Dragon Dictate make spaghetti out of this sentence, too.

Citizen Science at The VerbCorner Project

What do verbs mean? We'd like to know. For that reason, we just launched VerbCorner, a massive, crowd-sourced investigation into the meanings of verbs. 

Why do we need this project? Why not just look up what verbs mean in a dictionary? While dictionaries are enormously useful (I think I own something like 15), they are far from perfect. For one thing, it's usually very easy to find counter-examples even for what seem like straight-forward definitions. Take the following:
Bachelor: An unmarried man.
So is the Pope a bachelor? Is Neil Patrick Harris? How about a married man from a country in which men are allowed multiple wives?

At VerbCorner, rather than trying to work out the whole definition at once, we have broken meaning into many different components. At the site, you will find several different tasks. In each task, you will try to determine whether a particular verb has a particular component of meaning. 

If you are interested in what words mean and would like to help with this project, sign up for an account at Participation can be anonymous, but we are happy to recognize significant contributions from anyone who wishes it.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

A Critical Period for Learning Language?

If you bring adults and children into the lab and try teaching them a new language, adults will learn much more of the language much more rapidly than the children. This is odd, because probably one of the most famous facts about learning languages -- something known by just about everyone whether you are a scientist who studies language or not -- is that adults have a lot less success at learning language than children. So whatever it is that children do better, it's something that operates on a timescale too slow to see in the lab. 

This makes studying the differences between adult and child language learners tricky, and a lot less is known that we'd like. Even the shape of the change in language learning ability is not well-known: is the drop-off in language learning ability gradual, or is there a sudden plummet at a particular age? Many researchers favor the latter possibility, but it has been hard to demonstrate simply because of the problem of collecting data. The perhaps most comprehensive study comes from Kenji Hakuta, Ellen Bialystok and Edward Wiley, who used U.S.A. Census data from 2,016,317 Spanish-speaking immigrants and 324,444 Chinese-speaking* immigrants, to study English proficiency as a function of when the person began learning the language. 

Their graph shows a very gradual decline in English proficiency as a function of when the person moved to the U.S.

Unfortunately, the measure of English proficiency wasn't very sophisticated. The Census simply asks people to say how well they speak English: "not at all", "not well", "well", "very well", and "speak only English". This is better than nothing, and the authors show that it correlates with a more sophisticated test of English proficiency, but it's possible that the reason the lines in the graphs look so smooth is that this five-point scale is simply too coarse to show anything more. The measure also collapses over vocabulary, grammar, accent, etc., and we know that these behave differently (your ability to learn a native-like accent goes first).

A New Test

This was something we had in mind when devising The Vocab Quiz. If we get enough non-native Speakers of English, we could track English proficiency as a function of age ... at least as measured by vocabulary (we also have a grammar test in the works, but that's more difficult to put together and so may take us a while yet). I don't think we'll get two million participants, but even just a few thousand would be enough. If English is your second (or third or fourth, etc.) language, please participate. In addition to helping us with our research and helping advance the science of language in general, you will also be able to see how your vocabulary compares with the typical native English speaker who participates in the experiment.

Hakuta, K., Bialystok, E., & Wiley, E. (2003). Critical Evidence: A Test of the Critical-Period Hypothesis for Second-Language Acquisition Psychological Science, 14 (1), 31-38 DOI: 10.1111/1467-9280.01415

*Yes, I know: Chinese is a family of languages, not a single language. But the paper does not report a by-language breakdown for this group.

Living in an Imperfect World: Psycholinguistics Edition

You, sir, have tasted two whole worms. You have hissed all my mystery lectures and been caught fighting a liar in the quad. You will leave Oxford by the next town drain. -- Reverend Spooner.

There is an old tension in psycholinguistic (or linguistic) theory, which boils down to two ways of looking at language comprehension. When somebody says something to you, what do you do with that linguistic input? Is your goal to decode the sentence and figure out what the sentence means, or do you try to figure out what message the speaker intended to convey? The tension comes in because presumably we do a bit of both.

Suppose a young child says, "Look! A doggy!" while pointing to a cat. Most people will agree that technically, the child's sentence is about a dog. But most of can still work out that probably the child meant to talk about the cat; she used the word doggy either due to lack of vocabulary, confusion about the distinction between dogs and cats, or a simple speech error. Similarly, if your friend says at 7pm, "Let's go have lunch," technically your friend is suggesting having the midday meal, but probably you charitably assume he is just very hungry and so made a mistake in saying "lunch" instead of "dinner".

For a variety of reasons, linguistics and psycholinguistics have focused mostly on decoding sentences rather than intended meanings. This is important work about an important problem, but -- as we saw above -- it's only half the story. PNAS just published a paper by Gibson, Bergen, and Piantadosi that addresses the second half. Gibson and Bergen are at M.I.T., and Piantadosi recently graduated from M.I.T., and like much of the work coming out of Eastern Cambridge lately, they take a Bayesian perspective on the problem, and point out that the probability that the speaker intended to convey a particular message m given that they said sentence s is proportional to the prior probability that the speaker might want to convey m times the probability that they would say sentence s when intending to convey m.

This ends up accounting for the phenomenon brought up in Paragraph #2: If the literal meaning of the speaker's sentence isn't very likely to be what they intended to say ("Let's go have lunch", spoken at 7pj), but there is some other sentence that contains roughly the same words but has a more plausible meaning ("Let's go have dinner"), then you should infer that the intended message is the latter one and that the speaker made an error.

So far, this is not much more than a restatement of our intuitive theory in Paragraph #2. But a Gibson, Bergen and Piantadosi point out that a few non-trivial predictions come out of this. One is that you should assume that deletions (dropping a word) are more likely than insertions (adding a word). The reason is that there are only so many words that can be dropped from a particular sentence, so even if the probability of accidentally dropping a word is low, the probability of accidentally dropped a particular word isn't all that much lower. So if the intended sentence was "The ball was kicked by the girl", and the speaker accidentally dropped two words, the probability that the speaker happened to drop "was" and "by", resulting in the grammatical but unlikely sentence "The ball kicked the girl" is not so bad. However, suppose the intended sentence was "The girl kicked the ball", what are the chances the speaker accidentally adds "was" and "by", resulting in the grammatical but unlikely sentence "The girl was kicked by the ball"? Pretty much zilch, since English contains hundreds of thousands of words: There is pretty much no chance that those particular words would be inserted in those particular locations?

The authors present some data to back up these and some other predictions. For instance, if listeners are given reason to suspect that the speaker makes lots of speech errors, they are then even more likely to "correct" an unlikely sentence to a similar sentence with a more likely meaning.

There's plenty more work to be done. There are plenty of speech errors out there besides insertions and deletions, such as substitutions and the various phonological errors that made Rev. Spooner famous (see quote above). Work on phonological errors shows that speaker are more likely to make errors that result in real words (train->drain) than non-words (train->frain). Likely, the same is true of other types of errors. Building a full theory that incorporates all the complexity of speech processes is a ways off yet. But the work just published is an important proof of concept.

Gibson, E., Bergen, L., and Piantadosi, S. (2013). Rational integration of noisy evidence and prior semantic expectations in sentence interpretation Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1216438110

Do You Speak Korean?

Learning new languages is hard for many reasons. One of those reasons is that the meaning of an individual word can have a lot of nuances, and the degree to which those nuances match up with the nuances of similar words in your first language can make learning the new language easier; the degree to which the nuances diverge can make learning the
new language harder.

In a new experiment, we are looking at English-speakers learning Korean and Korean-speakers learning English. In particular, we are studying a specific set of words that previous research has suggested give foreign language learners a great deal of difficulty.

We are hoping that we will be able to track how knowledge of these words develops as you move from being a novice to a fluent speaker. For this, we will need to find a lots of people who are learning Korean, as well as Korean-speakers who are learning English. If you are one, please participate.

The experiment is called "Trials of the Heart". You can find it here.

We do also need monolingual English speakers (people whose first and essentially only language is English) for comparison, so if you that's you, you are welcome to participate, too!

Image credit

Evolutionary Psychology, Proximate Causation, & Ultimate Causation

Evolutionary psychology has always been somewhat controversial in the media for reasons that generally confuse me (Wikipedia has a nice rundown of the usual complaints). For instance, the good folks at Slate are particularly hostile (here, here and here), which is odd because they are also generally hostile towards Creationism (here, here and here). 

Given the overwhelming evidence that nearly every aspect of the human mind and behavior is at least partly heritable (and so at least partially determined by our genes), the only way to deny the claim that our minds are at least partially a product of evolution is to deny that evolution affects our genes – that is, deny the basic tenants of evolutionary theory. (I suppose you could try to deny the evidence of genetic influence on mind and behavior, but that would require turning a blind eye to such a wealth of data as to make Global Warming Denialism seem like a warm-up activity).

What's the matter with Evolutionary Psychology?

What is there to object to, anyway? Some of the problem seems definitional. Super-Science-Blogger Greg Laden acknowledges that applying evolutionary theory to the study of the human mind is a good idea, but that "evolutionary psychology" refers only to a very specific theory from Cosmides and Tooby, one with which he takes issue. And in general, a lot of the "critiques" I see in the media seem to involve equating the entire field with some specific hypothesis or set of hypotheses, particularly the more exotic ones. 

For instance, some years back Slate ran an article about "Evolutionary Psychology's Anti-Semite", a discussion of Kevin MacDonald, who has an idiosyncratic notion of Judaism as a "group evolution strategy" to maximize, through eugenics, intelligence (the article goes into some detail). It's a pretty nutty idea, gets basic historical facts wrong, and more importantly gets the science wrong. The article tries pretty hard to paint him as a mainstream Evolutionary Psychologist nonetheless. Interviewees aren't that helpful (they mostly dismiss the work as contradicting basic fundamentals of evolutionary theory), but the article author pulls up other evidence, like the fact that MacDonald acknowledged some mainstream researchers in one of his books. (For the record, I acknowledge Benicio del Toro as an inspiration, so you know he fully agrees with everything in this blog post. Oh, and Jenna-Louise Coleman, too.)

This spring, New York Times columnist John Tierney asserted that men must be innately more competitive than women since they monopolize the trophies in -- hold onto your vowels -- world Scrabble competitions. To bolster his case, Tierney turned to evolutionary psychology. In the distant past, he argued, a no-holds-barred desire to win would have been an adaptive advantage for many men, allowing them to get more girls, have more kids, and pass on their competitive genes to today's word-memorizing, vowel-hoarding Scrabble champs.
I will agree that this argument involves a bit of a stretch and is awfully hard to falsify (as the article goes on to point out). And sure, some claims made even by serious evolutionary psychologists are hard to falsify with current technology ... but then so is String Theory. And we do have many methods for testing evolutionary theory in general, and roughly the same ones work whether you are studying the mind and behavior or purely physical attributes of organisms. So, again, if you want to deny that claims about evolutionary psychology are testable, then you end up having to make roughly the same claim about evolutionary theory in general. 

Just common sense

It turns out that when you look at the biology, a good waist-hips ratio for a healthy woman is (roughly) .7, whereas the ideal for men is closer to .9. Now imagine we have a species of early hominids (Group A) that is genetically predispositioned such as that heterosexual men prefer women with a waist-hips ratio of .7 and heterosexual women prefer men with a waist-hips ratio of .9. Now let's say we have another species of early hominids (Group B) where the preferences are reversed, preferring men with ratios of .7 and women with ratios of .9. Since individuals of Group A prefer to mate with healthier partners than Group B does, which one do you think is going to have more surviving children? 

Now compare to Group C, where there is no innate component to interest in waist-hips ratios; beauty has to be learned. Group C is still at a disadvantage to Group A, since some of the people in it will learn to prefer the wrong proportions and preferentially mate with less healthy individuals. In short, all else equal, you would expect evolution to lead to hominids that prefer to mate with hominids that have close-to-ideal proportions.

(If you don't like waist-hips ratios, consider that humans prefer individuals without deformities and gaping sores and boils, and then play the same game.)

Here is another example. Suppose that in Group A, individuals find babies cute, which leads them to want to protect and nourish the infants. In Group B, individuals find babies repulsive, and many actually have an irrational fear of babies (that is, treating babies something like how we treat spiders, snakes & slugs). Which one do you think has more children that survive to adulthood? Once again, it's better to have a love of cuteness hardwired in rather than something you have to learn from society, since all it takes is for a society to get a few crazy ideas about what cute looks like ("they look better decapitated!") and then the whole civilization is wiped out. 

(If you think that babies just *are* objectively cute and that there's no psychology involved, consider this: Which do you find cuter, a human baby or a skunk baby? Which do you think a mother skunk finds cuter?)

These are the kinds of issues that mainstream evolutionary psychology trucks in. And the theory does produce new predictions. For instance, you'd expect that in species where a .7 waist-hips ratio is not ideal for females (that is, pretty much any species other than our own), it wouldn't be favored (and it isn't). And the field is generally fairly sensible, which is not to say that all the predictions are right or that evolutionary theory doesn't grow and improve over time (I understand from a recent conversation that there is now some argument about whether an instinct for third-party punishment is required for sustainable altruism, which is something I had thought was a settled matter). 

Findings: The Role of World Knowledge in Pronoun Interpretation

A few months ago, I posted the results of That Kind of Person. This was the final experiment in a paper on pronoun interpretation, a paper which is now in press. You can find a PDF of the accepted version here.

How it Began

Isaac Asimov famously observed that "the most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" That quote describes this project fairly well. The project grew out of a norming study. Norming studies aren't really even real experiments -- they are mini experiments used to choose stimuli.

I was designing an ERP ("brain wave") study of pronoun processing. A group in Europe had published a paper using ERPs to look at a well-known phenomenon in pronoun interpretation, one which has been discussed a lot on this blog, in which pronoun interpretation clearly depends on context:

(1) Sally frightens Mary because she...
(2) Sally likes Mary because she...

Most people think that "she" refers to Sally in (1) but Mary in (2). This seems to be a function of the verbs in (1-2), since that's all that's different between the sentences, and in fact other verbs also affect pronoun interpretation. We wanted to follow up some of the previous ERP work, and we were just choosing sentences. You get nice big ERP effects (that is, big changes in the brain waves) when something is surprising, so people often compare sentences with unexpected words to those with expected words, which is what this previous group had done:

(3) Sally frightens Bill because she...
(4) Bill frightens Sally because she...

You should get the sense that the pronoun "she" is a bit more surprising in (4) than in (3). Comparing these sentences to (1-2) should make it clear why this is.

The Twist

A number of authors argued that what is going on is that these sentences (1-4) introduce an explanation ("because..."). As you are reading or listening to the sentence, you think through typical causes of the event in question (frightening, liking, etc.) and so come up with a guess as to who is going to be mentioned in the explanation. More good explanations of an instance of frightening involve the frightener than the frightenee, and more good explanations of an instance of liking involve the like-ee than the liker.

The authors supported the argument by pointing to studies showing that what you know about the participants in the event matters. In general, you might think that in any given event involving a king and a butler, kings are more likely to be responsible for the event simply because kings have more power. So in the following sentence, you might interpret the pronoun as referring to the king even though it goes against the "typical" pattern for frighten (preferring explanations involve the frightener).

(5) The butler frightened the king because...

What got people particularly excited about this is that it all has to happen very fast. Studies have shown that you can interpret the pronoun in such sentences in a fraction of a second. If you can do this based on a complex inference about who is likely to do what, that's very impressive and puts strong constraints on our theory of language.

The Problem

I was in the process of designing an ERP experiment to follow up a previous one in Dutch that I wanted to replicate in English. I had created a number of sentences, and we were running a simple experiment in which people rate how "natural" the sentences sound. We were doing this just to make sure none of our sentences were weird, since that -- as already mentioned -- can have been effects on the brain waves, which could swamp any effects of the pronoun. Again, we expected people to rate (4) as less natural than (3); what we wanted to make sure was that people didn't rate both (3) and (4) as pretty odd. We tested a couple hundred such sentences, from which we would pick the best for the study.

I was worried, though, because a number of previous studies had suggested that gender itself might matter. This follows from the claim that who the event participants are matters (e.g., kings vs. butlers). Specifically, a few studies had reported that in a story about a man and a woman, people expect the man to be talked about more than the woman, analogous to expecting references to the king rather than the butler in (5). Was this a confound?

I ran the study anyway, because we would be able to see in the data just how bad the problem was. To my surprise, there was no effect of gender at all. I started looking at the literature more carefully and noticed that several people had similarly failed to find such effects. One paper had found an effect, but it seemed to be present in only a small handful of sentences out of the large number they had tested. I looked into studies that had investigated sentences like (5) and discovered ... that they didn't exist! Rather, the studies researchers had been citing weren't about pronoun interpretation at all but something else. To be fair, some researchers had suggested that there might be a relationship between this other phenomenon and pronoun interpretation, but it had never been shown. I followed up with some experiments seeing whether the king/butler manipulation would affect pronoun interpretation, and it didn't. (For good measure, I also showed that there is little if any relationship between that other phenomenon and pronouns.)

A Different Problem

So it looked like the data upon which much recent work on pronouns is built was either un-replicable or apocryphal. However, the associated theory had become so entrenched, that this was a difficult dataset to publish. I ultimately had to run around a dozen separate experiments in order to convince reviewers that these effects really don't exist (or mostly don't exist -- there do seem to be a tiny percentage of sentences, around 5%, where you can get reliable if very small effects of gender). (A typical paper has 1-4 experiments, so a dozen is a lot. Just in order to keep the paper from growing to an unmanageable length, I combined various experiments together and reported each one as a separate condition of a larger experiment.)

Most of these experiments were run on Amazon Mechanical Turk, but the final one was run at and was announced on this blog (read the results of that specific experiment here). The paper is now in press at Language & Cognitive Processes. You can read the final submitted version here.


So what does all this mean? In many ways, it's a correction to the literature. A lot of theoretical work was built around findings that turned out to be wrong or nonexistent. In particular, the idea that pronoun interpretation involves a lot of very rapid inferences based on your general knowledge about the world. That's not quite the same thing as having a new theory, but we've been exploring some possibilities that no doubt will be talked about more here in the future.

Joshua K. Hartshorne (2014). What is implicit causality? Language and Cognitive Processes