Field of Science

Does Global Warming Exist, and Other Questions We Want Answered

This week, I asked 101 people on Amazon Mechanical Turk both whether global temperatures have been increasing due to human activity AND what percentage of other people on Amazon Mechanical Turk would say yes to the first question. 78% agree with the answer to the first question. Here's the answers to the second, broken down by whether the respondent did or did not believe in man-made global warming:

Question: How many other people on Amazon Mechanical Turk believe global temperatures have been increasing due to human activity?

                     Average            1st Quartile-3rd Quartile
Believers         72%                         60%-84%
Denialists        58%                         50%-74%
Correct            78%                             ------

Notice that those who believe global warming is caused by human activity are much better at estimating how many other people will agree than are those who do not. Interestingly, the denialists' answer is much closer to the average of all Americans, rather than  Turkers (who are mostly but not exclusively American, and are certainly a non-random sample).

So what?

Why should we care? More importantly, why did I do this experiment? A major problem in science/life/everything is that people disagree about the answers to questions, and we have to decide who to believe. A common-sense strategy is to go with whatever the majority of experts says. There are two problems, though: first, it's not always easy to identify an expert, and second, the majority of experts can be wrong.

For instance, you might ask a group of Americans what the capital of Illinois or New York is. Although in theory, Americans should be experts in such matters (it's usually part of the high school curriculum), in fact the majority answer in both cases is likely to be incorrect (Chicago and New York City, rather than Springfield and Albany). This was even true in a recent study of, for instance, MIT or Princeton undergraduates, who in theory are smart and well-educated.

Which of these guys should you believe?

So how should we decide which experts to listen to, if we can't just go with "majority rules"? A long chain of research suggests an option: ask each of the experts to predict what the other experts would say. It turns out that the people who are best at estimating what other people's answers will be are also most likely to be correct. (I'd love to cite papers here, but the introduction here is coming from a talk I attended earlier in the week, and I don't have the the citations in my notes.) In essence, this is an old trick: ask people two questions, one of which you know the answer to and one of which you don't. Then trust the answers on the second question that come from the people who got the first question right. 

This method has been tested on a number of questions and works well. It was actually tested on the state capital problem described above, and it does much better than a simple "majority rules" approach. The speaker at the talk I went to argued that this is because people who are better able to estimate the average answer simply know more and are thus more reliable. Another way of looking at it though (which the speaker mentioned) is that someone who thinks Chicago is the capital of Illinois likely isn't considering any other possibilities, so when asked what other people will say guesses "Chicago." The person who knows that in fact Springfield is the capital probably nonetheless knows that many people will be tricked by the fact that Chicago is the best-known city in Illinois and thus will correctly guess lots of people will say Chicago but that some people will also say Springfield. 

Harder Questions

I wondered, then, how well it would work on for a question where everybody knows that there are two possible answers. So I surveyed Turkers about Global Warming. Believers were much better at estimating how many believers there are on Turk than were denialists.

Obviously, there are a few ways of interpreting this. Perhaps denialists underestimate both the proportion of climate scientists who believe in global warming (~100%) and the percentage of normal people who believe in global warming, and thus they think the evidence is weaker than it is. Alternatively, denialists don't believe in global warming and thus have trouble accepting that other people do and thus lower their estimates. The latter proposal, though, would suggest that believers should over-estimate the percentage of people who believe in global warming, though that is not in fact the case.

Will this method work in general? In some cases, it won't. If you asked expert physicists in 1530 about quantum mechanics, presumably none of them would believe it and all would correctly predict that none of the other would believe it. In other cases, it's irrelevant (near 100% of climatologists believe in man-made global warming, and I expect they all know that they all believe in it). More importantly, the method may work well for some types of questions and not others. I heard in this talk that researchers have started using the method to predict product sales and outcomes of sports matches, and it actually does quite well. I haven't seen any of the data yet, though.

For more posts on science and politics, click here and here.

Did your genes make you liberal?

"The new issue of the Journal of Politics, published by Cambridge University, carries the study that says political ideology may be caused by genetic predisposition."

"Scientists find 'liberal gene.'"
  --- NBC San Diego

"Liberals may owe their political outlook partly to their genetic make-up, according to new research from the University of California, San Diego, and Harvard University.  Ideology is affected not just by social factors, but also by a dopamine receptor gene called DRD4."
 -- University press release

As in the case yesterday of the study about sisters making you happy, these statements are all technically true (ish -- read below) but deeply misleading. The study in question looks at the effects of number of friends and the DRD4 gene on political ideology. Specifically, they asked people to self-rate on a 5-point scale from very conservative to very liberal. They tested for the DRD4 gene. They also asked people to list up to 5 close friends.

The number of friends one listed did not significantly predict political ideology, nor did the presence or absence of the DRD4 gene. However, there was a significant (p=.02) interaction ... significant, but apparently tiny. The authors do not discuss effect size, but we can try to piece together the information by looking at the regression coefficients.

An estimated coefficient means that if you increase the value of the predictor by 1, the outcome variable increases by the size of the coefficient. So imagine the coefficient between the presence of the gene and political orientation was 2. That would mean that, on average, people with the gene score 2 points higher (more liberal) on the 5-point political orientation scale.

The authors seem to be reporting standardized coefficients, which means that we're looking at increasing values by one standard deviation rather than by one point. The coefficient of the significant interaction 0.04. This means that roughly as the number of friends and presence of the gene increase by one standard deviation, political orientation scores increase by 0.04 standard deviations. The information we'd need to correctly interpret that isn't given in the paper, but a reasonable estimate is that this means that someone with one extra friend and the gene would score anywhere from .01 to .2 points higher on the score (remember, 1=very conservative, 2=conservative, 3=moderate, 4=liberal, 5=very liberal).

The authors give a little more information:
For people who have two copies of the [gene], an increase in number of friendships from 0 to 10 friends is associated with increasing ideology in the liberal direction by about 40% of a category on our five-category scale.
People with no copies of the gene were unaffected by the number of friends they had.

None of what I wrote above detracts from the theoretical importance of the paper. Identifying genes that influence behavior, even just a tiny bit, is important as it opens windows into the underlying mechanisms. And to their credit, the authors are very guarded and cautious in their discussion of the results. The media reports -- fed, no doubt, by the university press release -- have focused on the role of the gene in predicting behavior. It should be clear that the gene is next to useless in predicting, for instance, who somebody is going to vote for. Does that make it a gene for liberalism? Maybe.

I would point out one other worry about the study, which even the authors point out. They tested a number of different possible predictors. The chances of getting a false positive increases with every statistical test you run, and they do not appear to have corrected for multiple comparisons. Even with 2,000 participants (which is a large sample), the p-value for the significant interaction was only p=.02, which is significant but not very strong, so the risk that this will not replicate is real. As the authors say, "the way forward is to seek replication in different populations and age groups."

Question: What are sisters good for?

Answer: increasing your score on a 13-question test of happiness by 1 unit on one of the 13 questions.

A recent study of the effect of sisters on happiness has been getting a lot of press since it was featured on at the New York Times. It's just started hitting my corner of the blogosphere, since Mark Liberman filing an entry at Language Log early in the evening. On the whole, he was unimpressed. The paper didn't report data in sufficient detail to really get a sense of what was going on, so he tried to extrapolate based on what was in fact reported. His best estimate was that having a sister accounted for 0.4% of the variance in people's happiness.
This is a long way from the statement that "Adolescents with sisters feel less lonely, unloved, guilty, self-conscious and fearful", which is how ABC News characterized the study's findings, or "Statistical analyses showed that having a sister protected adolescents form feeling lonely, unloved, guilty, self-conscious and fearful", which is what the BUY press release said ... Such statements are true if you take "A's are X-er than B's" to mean simply that a statistical analysis showed that the mean value of a sample of X's was higher than the mean value of a sample of Y's, by an amount that was unlikely to be the result of sampling error.
Only an hour later, the ever wide-eyed Jonah Lehrer wrote
There's a surprisingly robust literature on the emotional benefits of having sisters. It turns out that having at least one female sibling makes us happier and less prone to depression...
I think this demonstrates nicely the added value of blogging, particularly science blogging. Journalists (like Lehrer) are rarely in a position to pick apart the methods of a study, whereas scientist bloggers can. I know many people miss the old media world, but the new one is exciting.

For more thoughts on science blogging, check this and this.

A Frog at the Bottom of a Well

My college had a graduate admissions counselor, with whom I consulted about applying to graduate school. Unfortunately, different fields (math, chemistry, literature, psychology) use completely different methods of selecting graduate students (and, in some sense graduate school itself is a very different beast depending on the field). My counselor didn't know anything about psychology, so much of the information I was given was dead wrong.

My graduate school also provides a lot of support for applying for jobs. This week, there is a panel on "The View from the Search Committee," which includes as panelists professors from Sociology, Romance Language & Literatures, and Organismic and Evolutionary Biology. That is, none of them are from Psychology. I do know that different fields recruit junior faculty in very different ways (for instance, linguistics practices a form of speed-dating at conferences as a first round of interviews, while others psych has no such system). I go? Keep in mind that I get lots of advice from faculty in my own department (and also from friends at other psych departments who have recently gone through the process). That is, how likely is it that the experience of these three professors will map on to the process I will actually go through? How likely is it that a one-hour panel can cover all the different variants of the process? How likely is it that there is information that would be relevant to anyone applying to any department that isn't obvious or something I am likely to already know?


The title of this post comes from an old proverb about a frog sitting at the bottom of a well, thinking that the patch of blue above is the whole world. Often (always?) we don't realize just how limited our own range of experience is.
photo: e_monk

Words and non-words

"...the modern non-word 'blogger'..." -- Dr. Royce Murray, editor of the journal Analytic Chemistry.

"209,000,000 results (0.21 seconds)" -- Google search for the "non-word" blogger.

There has been a lot of discussion about Royce Murray's bizzarre attack on blogging in the latest JAC editorial (the key sentence: I believe that the current phenomenon of "bloggers" should be of serious concern to scientists).

Dr. Isis has posted a nice take-down of the piece focusing on the age old testy relationship between scientists and journalists. My bigger concern with the editorial is that it is clear that Murray has no idea what a blog is, yet feels justified in writing an article about blogging. Here's a telling sentence:
Bloggers are entrepreneurs who sell “news” (more properly, opinion) to mass media: internet, radio, TV, and to some extent print news. In former days, these individuals would be referred to as “freelance writers”, which they still are; the creation of the modern non-word “blogger” does not change the purveyor.
Wrong! Wrong! Wrong! A freelance writer does sell articles to established media entities. Bloggers mostly write for their own blog (hence the "non-word" blog-ger). There are of course those who are hired to blog for major media outlets like Scientific American or Wired, but then they are essentially columnists (in fact, many of the columnists at The New York Times have NYTimes blogs at the request of the newspaper).
This magnifies, for the lay reader, the dual problems in assessing credibility: a) not having a single stable employer (like a newspaper, which can insist on credentials and/or education background) frees the blogger from the requirement of consistent information reliability ... Who are the fact-checkers now?
Wait, newspapers don't insist on credentials and don't fact-check the stories they get from freelancers? Why is Murray complaining about bloggers, then? In any case, it's not like journals like Analytic Chemistry do a good job of fact-checking what they publish or that they stop publishing papers by people whose results never replicate. Journal editors living in glass houses...

This focus on credentials is a bit odd -- I thought truth was the only credential a scientist needed -- and in any case seriously misplaced. I challenge Murray to find a popular science blog written by someone who is neither a fully-credentialed scientist writing about his/her area of expertise, nor a well-established science journalist working for a major media outlet.

Are there crack-pot bloggers out there? Sure. But most don't have much of an audience (certainly, their audience is smaller than the fact-checked, establishment media-approved Glenn Beck). Instead, we have a network of scientists and science enthusiasts discussing, analyzing and presenting science. What's to hate about that?

You're Wrong

John Ioannidis has been getting a lot of press lately. He reached the cover of the last issue of The Atlantic Monthly. David Dobbs wrote about him here (and a few years ago, here). This is the doctor known for his claim that around half of medical studies are false -- that is about 80% of non-randomized trials and even 25% of randomized trials. These are not just dinky findings published in dinky journals; of 49 of the most highly-regarded papers published over a 13-year period, 34 of the 45 with that claimed to have found effective treatments had been retested, and 41% of those tests failed to replicated the original result.


Quoting the Atlantic Monthly:
Ioannidis initially thought the community might come out fighting. Instead, it seemed relieved, as if it had been guiltily waiting for someone to low the whistle...
Well, it's not surprising. The appropriate analog in psychology is the randomized trial, of which in medicine 25% turn out to be false according to this research (which hopefully isn't itself false). As Ioannidis has detailed, the system is set up to reward false positives. Journals -- particularly glamour mags like Science -- preferentially accept surprising results, and the best way to have a surprising result is to have one that is wrong. Incorrect results happen: "statistically significant" means "has only a 5% probability of happening by random chance." This means (in theory) that 5% of all experiments published in journals should reach the wrong conclusions. If journals are biased in favor of accepting exactly those 5%, then the proportion should be higher.

There are other factors at work. Some scientists are sloppier than others, and many of the ways in which one can be sloppy lead to significant and/or surprising results. For instance, 5% of experiments have false positives. There are labs that will run the same experiment 6 times with minor tweaks. There is a (1-.95^6) * 100 = 26.5% chance that one of those will have a significant result. The lab may then publish only that final experiments and not report the others. If sloppy results lead to high-impact publications, survival of the fittest dictates that sloppy labs will reap the accolades, get the grant money, tenure, etc.

Keep in mind that often many different labs are trying to do the same thing. For instance, in developmental psychology, one of the deep questions is what is innate? So many labs are testing younger and younger infants, trying to find evidence that these younger infants can do X, Y or Z. If 10 labs all run the same experiment, there's a (1-.95^10) * 100 = 40.1% chance of one of the labs finding a significant result.

Countervailing Forces

Thus, there are many incentives to publish incorrect data. Meanwhile, there are very few disincentives to doing so. If you publish something that turns out to replicate, it is very unlikely that anyone will publish a failure to replicate -- simply because it is very difficult to publish a failure to replicate. If someone does manage to publish such a paper, it will certainly be in a lower-profile journal (which is, incidentally, a disincentive to publishing such work to begin with).

Similarly, consider what happens when you run a study and get a surprising result. You could replicate it yourself to make sure you trust the result. That takes time, and there's a decent chance it won't replicate. If you do replicate it, you can't publish the replication (I tried to in a recent paper submission, and a reviewer insisted that I remove reference to the replication on account of it being "unnecessary"). If the replication works, you'll gain nothing. If it fails, you won't get to publish the paper. Either way, you'll have spent valuable time you could have spent working on a different study leading to a different paper. 

In short, there are good reasons to expect that 25% of studies -- particularly in the high-profile journals -- are un-replicable.

What to do?

Typically, solutions proposed involve changing attitudes. The Atlantic Monthly suggests:
We could solve much of the wrongness problem, Ioannidis says, if the world simply stopped expecting scientists to be right. That's because being wrong in science is fine, and even necessary ... But as long as careers remain contingent on producing a stream of research that's dressed up to seem more right than it is, scientists will keep delivering exactly that.
I've heard this idea expressed elsewhere. In the aftermath of Hausergate, a number of people suggested that a factor was the pressure-cooker that is the Harvard tenure process, and that Harvard needs to stop putting so much pressure on people to publish exciting results.

So the idea is that we should stop rewarding scientists for having interesting results, and instead reward the ones who have uninteresting results? Journals should publish only the most staid research, and universities should reward tenure not based on the number of highly-cited papers you have written, but based on how many papers you've written which have never been cited? I like that idea. I can run a boring study in a few hours and write it up in the afternoon: "Language Abilities in Cambridge Toddlers are Unaffected by Presence or Absence of Snow in Patagonia." That's boring and almost certainly true. And no one will ever cite it.

Seriously, though, public awareness campaigns telling people to be more responsible are great, and sometimes they even help, but I don't know how much can be done without changing the incentive structure itself.


I don't have a solution, but I think Ioannidis again points us towards one. He found that papers continue to be cited long after they have been convincingly and publicly refuted. I was discussing this issue with a colleague some time back and mentioned a well-known memory paper that nobody can replicated. Multiple failures-to-replicate have been published. Yet I still see it cited all the time. The colleague said, "Wow! I wish you had told me earlier. We just had a student spend two years trying to follow up that paper, and the student just couldn't get the method to work."

Never mind that researchers rarely bother to replicate published work -- even if they did, we have no mechanism for tracking which papers have been successfully replicated and which papers can't be replicated.

Tenure is awarded partly on how often your work has been cited, and we have many nice, accessible databases that will tell you how often a paper has been cited. Journals are ranked by how often their papers are cited. What if we rewarded researchers and journals based on how well their papers hold up to replication? Maybe it would help, maybe it wouldn't, but without a mechanism for tracking this information, this is at best an intellectual enterprise.

Even if such a database wasn't ultimately useful in decreasing the number of wrong papers, at least we'd know which papers were wrong.

Darn You, Amazon

For a while now, my department has had problems with packages going missing. A suspiciously large number of them were sent by Amazon. A couple weeks ago, our building manager started to get suspicious. He emailed the department:
Today I received what is now the third complaint about problems with shipping of products at Amazon. I don't know which courier they were using, but the packages were left on the [unmanned] security desk in the 1st floor lobby ... In another recent case, the packages were dumped in front of the Center Office door while I was out. Interestingly, tracking showed that they were signed for me at a time that I was attending a meeting ... it's happened a few times. Usually the packages have simply been mis-delivered ... and turn up about a week later.
Figure 1. A prototypical, over-packaged Amazon box.

Some days later, he followed up with more information. Another department denizen noted that Amazon has started using various different couriers. She wrote "The other day I ordered 2 books and one came via FedEx and one came via UPS." The building manager noted that FedEx has started outsourcing delivery to UPS. He continued:
What's odd is that we get shipments via UPS and FedEx all the time. Usually, it's the same drivers ... We know some of them by name.
He concluded that perhaps Amazon (and UPS and FedEx) were starting to use a variety of subcontractors who don't understand how to deliver packages at large buildings (e.g., you can't just leave them in a random corner of the lobby).

Yesterday, we got a follow-up on the story. The building manager ordered a package from Amazon to see what would happen. The building manager was on his way to lunch when he spotted a van marked "package delivery" and an un-uniformed courier. The courier was leaving the building sans package, so the building manager knew the package was incorrectly delivered (he obviously hadn't signed for it)!. He tried to explain to the courier building package policies but
He was very polite, but did not speak much English, so I'm not sure just how much he took away from our little chat.
The building manager -- tired of dealing with lost and mis-delivered packages -- is on a mission to get someone from Amazon to care:
Calling them on the phone was unsatisfactory. Everyone in any position of authority is thoroughly insulated from public accountability.
Perhaps. But that's why blogs exist. Seriously, Amazon, do something about this.

photo: acordova

Universal Grammar is dead. Long live Universal Grammar.

Last year, in a commentary on Evans and Levinson's "The myth of language universals: Language diversity and its importance for cognitive science" in Behavioral and Brain Sciences (a journal which published one target paper and dozens of commentaries in each issue), Michael Tomasello wrote:
I am told that a number of supporters of universal grammar will be writing commentaries on this article. Though I have not seen them, here is what is certain. You will not be seeing arguments of the following type: I have systematically looked at a well-chosen sample of the world's languages, and I have discerned the following universals ... And you will not even be seeing specific hypotheses about what we might find in universal grammar if we followed such a procedure.
Hmmm. There are no specific proposals about what might be in UG... Clearly Tomasello doesn't read this blog much. Granted, for that he should probably be forgiven. But he also clearly hasn't read Chomsky lately. Here's the abstract of the well-known Hauser, Chomsky & Fitch (2002):
We submit that a distinction should be made between the faculty of language in the broad sense (FLB) and in the narrow sense (FLN). FLB includes a sensory-motor system, a conceptual-intentional system, and the computational mechanisms for recursion, providing the capacity to generate an infinite range of expressions from a finite set of elements. We hypothesize that FLN only includes recursion and is the only uniquely human component of the faculty of language.
Later on, HCF make it clear that FLN is another way of thinking about what elsewhere is called "universal grammar" -- that is, constraints on learning that allow the learning of language.

Tomasello's claim about the other commentaries (that they won't make specific claims about what is in UG) is also quickly falsified, and by the usual suspects. For instance, Steve Pinker and Ray Jackendoff devote much of their commentary to describing grammatical principles that could be -- but aren't -- instantiated in any language.

Tomasello's thinking is perhaps made more clear by a later comment later in his commentary:
For sure, all fo the world's languages have things in common, and [Evans and Levinson] document a number of them. But these commonalities come not from any universal grammar, but rather from universal aspects of human cognition, social interaction, and information processing...
Thus, it seems he agrees that there are constraints on language learning that shape what languages exist. This, for instance, is the usual counter-argument to Pinker and Jackendoff's nonexistent languages: those languages don't exist because they're really stupid languages to have. I doubt Pinker or Jackendoff are particular fazed by those critiques, since they are interested in constraints on language learning, and this proposed Stupidity Constraint is still a constraint. Even Hauser, Chomsky and Fitch (2002) allow for constraints on language that are not specific to language (that's their FLB).

So perhaps Tomasello fundamentally agrees with people who argue for Universal Grammar, this is just a terminology war. They call fundamental cognitive constraints on language learning "Universal Grammar" and he uses the term to refer to something else: for instance, proposals about specific grammatical rules that we are born knowing. Then, his claim is that nobody has any proposals about such rules.

If that is what he is claiming, that is also quickly falsified (if it hasn't already been falsified by HCF's claims about recursion). Mark C. Baker, by the third paragraph of his commentary, is already quoting one of his well-known suggested language universals:
(1) The Verb-Object Constraint (VOC): A nominal that expresses the theme/patient of an event combines with the event-denoting verb before a nominal that expresses the agent/cause does.
And I could keep on picking examples. For those outside of the field, it's important to point out that there wasn't anything surprising in the Baker commentary or the Pinker and Jackendoff commentary. They were simply repeating well-known arguments they (and others) have made many times before. And these are not obscure arguments. Writing an article about Universal Grammar that fails to mention Chomsky, Pinker, Jackendoff or Baker would be like writing an article about major American cities without mentioning New York, Boston, San Francisco or Los Angeles.

Don't get me wrong. Tomasello has produced absurd numbers of high-quality studies and I am a big admirer of his work. But if he is going to make blanket statements about an entire literature, he might want to read one or two of the papers in that literature first.

Tomasello, M. (2009). Universal grammar is dead Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X09990744

Evans, N., & Levinson, S. (2009). The myth of language universals: Language diversity and its importance for cognitive science Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X0999094X

Hauser MD, Chomsky N, & Fitch WT (2002). The faculty of language: what is it, who has it, and how did it evolve? Science (New York, N.Y.), 298 (5598), 1569-79 PMID: 12446899

Baker, M. (2009). Language universals: Abstract but not mythological Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X09990604

Pinker, S., & Jackendoff, R. (2009). The reality of a universal language faculty Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X09990720

Findings: Linguistic Universals in Pronoun Resolution

Unlike a proper name (Jane Austen), a pronoun (she) can refer to a different person just about every time it is uttered. While we occasionally get bogged down in conversation trying to interpret a pronoun (Wait! Who are you talking about?), for the most part we sail through sentences with pronouns, not even noticing the ambiguity.

I have been running a number of studies on pronoun understanding. One line of work looks at a peculiar contextual effect, originally discovered by Garvey and Caramazza in the mid-70s:

(1) Sally frightens Mary because she...
(2) Sally loves Mary because she...

Although the pronoun is ambiguous, most people guess that she refers to Sally in (1) but Mary in (2). That is, the verb used (frightens, loves) seems to affect pronoun resolution. Over the last 36 years, many thousands of undergraduates (and many more thousands of participants at have been put through pronoun-interpretation experiments in an attempt to figure out what is going on. While this is a relatively small problem in the Big World of Pronouns -- it applies only to a small number of sentences in which pronouns appear -- it is also a thorn in the side of many broader theories of pronoun processing. And so the interest.

One open question has been whether the same verbs show the same pronoun biases across different languages. That is, frighten is subject-biased and fear is object-biased (the presence of frightens in sentences like 1 and 2 causes people to resolve the pronoun to the subject, Sally, whereas the presence of loves pushes them towards the object, Mary). If this were the case, it would suggest that something about the literal meaning of the verb is what gives rise to the pronoun bias.

(What else could be causing the pronoun bias, you ask? There are lots of other possibilities. For instance, it might be that verbs have some lexical feature tagging them as subject- or object-biased -- not an obvious solution to me but no less unlikely than other proposals out there for other phenomena. Or people might have learned that certain verbs probabilistically predict that subsequent pronouns were be interpreted as referring to the previous subject or object -- that is, there is no real reason that frighten is subject-biased; it's a statistical fluke of our language and we all learn to talk/listen that way because everyone else talks/listens that way.)

random cheetah picture
(couldn't find a picture about cross-linguistic studies of pronouns)

Over the last couple years, I ran a series of pronoun interpretation experiments in English, Russian and Mandarin. There is also a Japanese experiment, but the data for that one have been slow coming in. The English and Russian experiments were run through my website, and I ran the Mandarin one in Taiwan last Spring. I also analyzed Spanish data reported by Goikoetxea et al. (2008). Basically, in all the experiments participants were given sentences like (1) and (2) -- but in the relevant language -- and asked to identify who the pronoun referred to.

The results show a great deal of cross-linguistic regularity. Verbs that are subject-biased in one language are almost always subject-biased in the others, and the same is true for object-biased verbs. I am in the process of writing up the results (just finished Draft 3) and I will discuss these data in more detail in the future, answering questions like how I identify the same verb in different languages. For now, though, here is a little data.

Below is a table with four different verbs and the percentage of people who interpreted the pronoun as referring to the subject of the previous verb. It wasn't the case that the same verbs appeared in all four experiments, so where the experiment didn't include the relevant verb, I've put in an ellipsis.

                         Subject-Biases for Four Groups of Related Verbs in Four Languages                                     
                        Group 1                        Group 2                Group 3                        Group 4
English            convinces 57%          forgives 45%      remembers 24%          understands 60%
Spanish            …                                 …                          recordar 22%               comprender 63%
Russian            ubezhdala 74%         izvinjala 33%     pomnila 47%               ponimala 60%
Mandarin         shuofu 73%               baorong 37%      …                                    …

For some of these verbs, the numbers are closer than for others, but for all verbs, if the verb was subject-biased in one language (more than 50% of participants interpreted the pronoun as referring to the subject), it was subject-biased in all languages. If it was object-biased in one language, it was object-biased in the others.

For the most part, this is not how I analyze the data in the actual paper. In general, it is hard to identify translation-equivalent verbs (for instance, does the Russian nenavidet' mean hate, despise or detest?), so I employ some tricks to get around that. So this particular table actually just got jettisoned from Draft 3 of the paper, but I like it and feel it should get published somewhere. Now it is published on the blog.

BTW If anyone knows how to make bibligraphies in Chrome without getting funky ampersands (see below), please let me know.
Catherine Garvey, & Alfonso Caramazza (1974). Implicit causality in verbs Linguistic Inquiry, 5, 459-464

Goikoetxea, E., Pascual, G., & Acha, J. (2008). Normative study of the implicit causality of 100 interpersonal verbs in Spanish Behavior Research Methods, 40 (3), 760-772 DOI: 10.3758/BRM.40.3.760

photo: Kevin Law

Lab Notebook: Verb Resources

It's good to be studying language now, and not a few decades ago. There are a number of invaluable resources freely available on the Web.

The resource I use the most -- and without which much of my research would have been impossible -- is Martha Palmer & co.'s VerbNet, which is a meticulous semantic analysis of some several thousand English verbs. This is invaluable when choosing verbs for stimuli, as you can choose verbs that are similar to or differ from one another along particular dimensions. It's also useful for finding polysemous and nonpolysemous verbs where polysemy is defined in a very rigorous way.

Meichun Liu and her students at NCTU in Taiwan have been working on a similar project in Mandarin, Mandarin VerbNet. This resource has proved extremely valuable as I've been writing up some work I've been doing in Mandarin, and I only wish I had known about it when I constructed my stimuli.

I bring this up in case these resources are of use to anyone else. Mandarin VerbNet is particularly hard to find. I personally spent several months looking for it.

Findings: The Causality Implicit in Language

Finding Causes

Consider the following:

(1) Sally hates Mary.
a. How likely is this because Sally is the kind of person who hates people?
b. How likely is this because Mary is the kind of person whom people hate?

Sally hates Mary doesn't obviously supply the relevant information, but starting with work by Roger Brown and Debora Fish in 1983, numerous studies have found that people nonetheless rate (a) as more likely than (b). In contrast, people find Sally frightens Mary more indicative of Sally than of Mary (the equivalent of rating (b) higher than (a)). Sentences like Sally likes Mary are called “object-biased,” and sentences like Sally frightens Mary are called “subject-biased.” There are many of sentences of both types.

Brown and Fish, along with many of the researchers who followed them, explain this in terms of an inference from knowledge about how the world works:
Consider the two verbs flatter and slander… Just about everyone (most or all persons) can be flattered or slandered. There is no special prerequisite. It is always possible to be the object of slander or flattery … By sharp contrast, however, not everyone, by any means, not even most or, perhaps, many are disposed to flatter or to slander… [Thus] to know that one party to an interaction is disposed to flatter is to have some basis for predicting flattery whereas to know only that one party can be flattered is to know little more than that that party is human. (Brown and Fish 1983, p. 265)
Similar results are found by using other ways of asking about who is at fault:

(2) Sally hates Mary.
a. Who is most likely responsible?   Sally or Mary?

(The photo on the right came up on Flickr when I searched for pictures about "causes". It turns out Flickr is not a good place to look for pictures about "hating," "frightening," or "causes". But I liked this picture.)

Understanding Pronouns

Now consider:

(3) Sally hates Mary because she...
(4) Sally frightens Mary because she...

Most people think that "she" refers to Mary in (3) but Sally in (4). This is a bias -- not absolute -- but it is robust and easy to replicate. Again, there are many verbs which are "object-biased" like hates and many which are "subject-biased" like frightens. Just as in the causal attribution effect above, this pronoun effect seems to be a systematic effect of (at least) the verb used. This fact was first discovered by Catherine Garvey and Alfonso Caramazza in the mid-70s and has been studied extensively first.

The typical explanation of the pronoun effect is that the word "because" implies that you are about to get an explanation of what just happened. Explanations usually refer to causes. So you expect the clause starting with she to refer to the cause of first part of the sentence. Therefore, people must think that Mary caused Sally hates Mary but Sally caused Sally frightens Mary.

Causes and Pronouns

Both effects are called "implicit causality," and researchers have generally assumed that the causal attribution effect and the pronoun effect are basically one and the same. An even stronger version of this claim would be that the pronoun effect relies on the causal attribution effect. People resolve the meaning of the pronouns in (3) and (4) based on who they think the cause of the first part of the sentence is. The causal attribution task in (1) and (2) is supposed to measure exactly that: who people think the cause is.

Although people have been doing this research for around three decades, nobody seems to have actually checked whether this is true -- that is, are verbs that are subject-biased in terms of causal attribution also subject-biased in terms of pronoun interpretation?

I recently ran a series of three studies on Amazon Mechanical Turk to answer this question. The answer is "no."

This figure shows the relationship between causal attribution biases (positive numbers mean the verb is subject-biased, negative means its object-biased) and pronoun biases (100 = completely subject-biased, 0 = completely object-biased). Though there is a trend line in the right direction, it's essentially artifactual. I tested four different types of verbs (the details of the verb classes take longer to explain than they are interesting), and it happens that none of them were subject-biased in terms of pronoun interpretation but object-biased in terms of causal attribution (good thing, since otherwise I would have had nowhere to put the legend). There probably are some such verbs; I just only tested a few types.

I ran three different experiments using somewhat different methods, and all gave similar results (that's Experiment 2 above).

More evidence

A number of previous studies showed that causal attribution is affected by who the subject and object are. For instance, people are more object-biased in interpreting The employee hated the boss than for The boss hated the employee. That is, they seem to think that whether the boss is more likely to be the cause whether the boss is the one hating or hated. This makes some sense: bosses are in a better position to effect employees than vice versa.

I was able to find this effect in my causal attribution experiments, but there was no effect on pronoun resolution. That is, people thought "he" referred to the employee in (5) and the boss in (6) at pretty much the same rate.

(5) The boss hated the employee because he...
(6) The employee hated the boss because he...


This strongly suggests that these two effects are two different effects, due to different underlying mechanisms. I think this will come as a surprise to most people who have studied these effects in the past. It also is a surprise in terms of what we know about language processing. There is lots of evidence that people use any and all relevant information when they are interpreting language. Why aren't people using the conceptualization of the world as revealed by the causal attribution task when interpreting pronouns? And what are people doing when they interpret pronouns in these contexts?

I do have the beginnings of an answer to the latter question, but since  the data in this experiment doesn't speak it, that will have to wait for a future post.

Brown, R., & Fish, D. (1983). The psychological causality implicit in language Cognition, 14 (3), 237-273 DOI: 10.1016/0010-0277(83)90006-9

Garvey, C., & Caramzza, A. (1974). Implicit causality in verbs Linguistic Inquiry, 5, 459-464

Picture: Cobalt123.

Tables, Charts & Figures

APA format (required for most journals I read/publish in) stipulates that figures and tables should not be included in the parts of the manuscript in which you actually talk about them, but rather they should all come at the end of the manuscript. I understand how this might be of use to the type-setter, but I find it a pain when actually trying to read a manuscript. I know I'm not the only one, because in some of the manuscripts I've submitted for review before I actually violated APA format and put the figures in-line, and the reviewers actually thanked me in their review and suggested that this should become journal policy. (The idea is that after acceptance, you resubmit with the figures and tables in APA format, but that during the review process, you put them in-line.)

With that in mind, I left my figures in situ in my last journal submission. The staff at the journal promptly returned the manuscript without review, saying that they couldn't/wouldn't review a paper that didn't follow APA guidelines on tables and figures.

Obviously I reformatted and resubmitted (the customer/journal is always right), but I put this out to the blogosphere: does anyone actually like having the figures at the end of the manuscript?

New Grad School Rankings Don't Pass the Smell Test

The more I look at the new graduate school rankings, the more deeply confused I am. Just after publishing my last post, it suddenly dawned on me that something was seriously wrong with the number of publications per faculty data. Looking again at the Harvard data, the spreadsheet claims 2.5 publications per faculty for the time period 2000-2006. I think this is supposed to be per faculty per year, though the it's not entirely clear. As will be shown below, there's no way that number can be correct.

First, though, here's what the report says about how the number was calculated:
Data from the Thompson Reuters (formerly Institute for Scientific Information) list of publications were used to construct this variable. It is the average over seven years, 2000-2006, of the number of articles for each allocated faculty member divided by the total number of faculty allocated to the program. Data were obtained by matching faculty lists supplied by the programs to Thompson Reuters and cover publications extending back to 1981. For multi-authored articles, a publication is awarded for each author on the paper who is also on a faculty list. 
For computer science, refereed papers from conferences were used as well as articles. Data from résumés submitted by the humanities faculty were also used to construct this variable. They are made up of two measures: the number of published books and the number of articles published during the period 1986 to 2006 that were listed on the résumé. The calculated measure was the sum of five times the number of books plus the number of articles for each allocated faculty member divided by the faculty allocated to the program. In computing the allocated faculty to the program, only the allocations of the faculty who submitted résumés were added to get the allocation.
The actual data

I took a quick look through the CVs of a reasonable subset of faculty who were at Harvard during that time period. Here are their approximate publications per year (modulo any counting errors on my part -- I was scanning quickly). I should note that some faculty list book chapters separately on their CVs, but some do not. If we want to exclude book chapters, some of these numbers would go down, but only slightly.

Caramazza 10.8
*Hauser 13.6
Carey 4.7
Nakayama 5.9
Schacter 14.6
Kosslyn 10.3
Spelke 7.7
Snedeker 1.1
Wegner 2.3
Gilbert 4.0

One thing that pops out is that people doing work involving adult vision (Caramazza, Nakayama, Kosslyn) publish a lot more than developmental folk (Carey, Spelke, Snedeker). The other thing is that publication rates are very high (except for my fabulous advisor, who was not a fast publisher in her early days, but has been picking up speed since 2006, and Wegner, who for some reason in 2000-2002 didn't publish any papers).

What on Earth is going on? I have a couple hypotheses. First, I know the report used weights when calculating composite scores for the rankings, so perhaps 2.5 reflects a weighted number, not an actual number of publications. That would make sense except that nothing I've found in the spreadsheet itself, the description of variables, or the methodology PDF supports that view.

Another possibility is that above I accounted for only about 1/4-1/3 of the faculty. Perhaps I'm over-counting power publishers. Perhaps. But unless the people I left off this list weren't publishing at all, it would be very hard to get an average of 2.5 publications per faculty per year. And I know I excluded some other power publishers (Cavanagh was around then, for instance).

A possible explanation?

The best explanation I can think of is that the report actually is including a bunch of faculty who didn't publish at all. This is further supported by the fact that the report claims that only 78% of Harvard faculty had outside grants, whereas I'm pretty sure all professors in our department -- except perhaps brand new ones who are still on start-up funds -- have (multiple) outside grants.

But there are other faculty in our department who are not professors and do not (typically) do (much) research -- and thus do not publish or have outside grants. Right now our department lists 2 "lecturers" and 4 "college fellows." They're typically on short appointments (I think about 2 years). They're not tenure track, they don't have labs, they don't advise graduate students, and I'm not even sure they have offices. So in terms of ranking a graduate program, they're largely irrelevant. (Which isn't a slight against them -- I know two of the current fellows, and they're awesome folk.)

So of 33 listed faculty this year, 6 are not professors with labs and thus are publishing at much lowered rates (if at all) and don't have outside grants. That puts us in the ballpark of the grant data in the report (82%). I'm not sure if it's enough to explain the discrepancy in publication rates, but it certainly would get us closer.

Again, it is true that the lecturers and fellows are listed as faculty, and the report would be in within its rights to count them ... but not if the report wants to measure the right thing. The report is purporting to measure the quality and quantity of the research put out by the department, so counting non-research faculty is misleading at best.


Between this post and the last, I've found some serious problems in the National Academies' graduate school rankings report. Several of the easiest-to-quantify measures they include simply don't pass the smell test. They are either measuring the wrong thing, or they're complete bullshit. Either way, it's a problem.

(Or, as I said before, the numbers given have been transformed in some peculiar, undocumented way. Which I suppose would mean at least they were measuring the right thing, though reporting misleading numbers is still a problem.)
*Before anyone makes any Hauser jokes, he was on the faculty, so he would have been included in the National Academies' report, which is what we're discussing here. In any case, removing him would not drastically change the publications-per-faculty rate.

The Best Graduate Programs in Psychology

UPDATE * The report discussed below is even more problematic than I thought.

The National Academies' just published an assessment of U.S. graduate research programs. Rather than compiling a single ranking, they rank programs in a number of different ways -- and also published data on the variables used to calculate those different rankings -- so you can sort the data however you like. Another aspect to like is that the methodology recognizes uncertainty and measurement error, so they actually estimate an upper-bound and lower-bound on all of the rankings (what they call the 5th and 95th percentile ranking, respectively).

Ranked, Based on Research

So how do the data come out? Here are the top programs in terms of "research activity" (using the 5th percentile rankings):

1. University of Wisconsin-Madison, Psychology
2. Harvard University, Psychology
3. Princeton University, Psychology
4. San Diego State University-UC, Clinical Psychology
5. University of Rochester, Social-Personality Psychology
6. Stanford University, Psychology
7. University of Rochester, Brain & Cognitive Sciences
8. University of Pittsburgh-Pittsburgh Campus, Psychology
9. University of Colorado at Boulder, Psychology
10. Brown University, Cognitive and Linguistic Sciences: Cognitive Sciences

Yes, it's annoying that some schools have multiple psychology departments and thus each is ranked separately, leading to some apples v. oranges comparisons (e.g., vision researchers publish much faster than developmental researchers, partly because their data is orders of magnitude faster/easier to collect; a department with disproportionate numbers of vision researchers is going to have an advantage).

What is nice is that these numbers can be broken down in terms of the component variables. Here are rankings in terms of publications per faculty per year and citations per publication:

Publications per faculty per year

1. State University of New York at Albany, Biopsychology
2. University of Wisconsin-Madison, Psychology
3. Syracuse University Main Campus, Clinical Psychology
4. San Diego State University-UC, Clinical Psychology
5. Harvard University, Psychology
6. University of Pittsburgh-Pittsburgh Campus, Psychology
7. University of Rochester, Social-Personality Psychology
8. Florida State University, Psychology
9. University of Colorado-Boulder, Psychology
10. State University of New York-Albany, Clinical Psychology

Average Citations per Faculty

1. University of Wisconsin-Madison, Psychology
2. Harvard University, Psychology
3. San Diego State University-UC, Clinical Psychology
4. Princeton University, Psychology
5. University of Rochester, Social_Personality Psychology
6. Johns Hopkins University, Psychological and Brain Sciences
7. University of Pittsburgh-Pittsburgh Campus, Psychology
8. University of Colorado-Boulder, Psychology
9. Yale University, Psychology
10. Duke University, Psychology

So what seems to be going on is that there are a lot of schools on the first list which publish large numbers of papers that nobody cites. If you combine the two lists in order to get average number of citations per year per faculty, here are the rankings. I'm including numbers this time so you can see the distance between the top few and the others. The #1 program doubles the rate of citations of the #10 program.

Average Citations per Faculty per Year

1. University of Wisconsin-Madison, Psychology - 13.4
2. Harvard University, Psychology - 12.7
3. San Diego State University-UC, Clinical Psychology - 11.0
4. Princeton University, Psychology - 10.6
5. University of Rochester, Social-Personality Psychology - 10.6
6. Johns Hopkins University, Psychological and Brain Sciences - 8.8
7. University of Pittsburgh-Pittsburgh Campus, Psychology - 8.3
8. University of Colorado-Boulder, Psychology - 8.0
9. Yale University, Psychology - 7.5
10. Duke University, Psychology - 6.9

The biggest surprise for me on these lists is that University of Pittsburg is on it (it's not a program I hear about often) and that Stanford is not.

Student Support

Never mind about the research, how do the students do? It's hard to say, partly because the variables measured aren't necessarily the ones I would measure, and partly because I don't believe their data. The student support & outcomes composite is build out of:

Percent of first year students with full financial support
Average completion rate within 6 years
Median time to degree
Percent of students with academic plans
Program collects data about post-graduation employment

That final variable is something that would only be included by the data-will-save-us-all crowd; it doesn't seem to have any direct relationship to student support or outcome. The fact that they focus on first year funding only is odd. I think it's huge that my program guarantees 5 years -- and for 3 of those we don't have to teach. Similarly, one might care whether funding is tied to faculty or given to the students directly. Or whether there are grants to attend conferences, mini-grants to do research not supported by your advisor, etc.

But leaving aside whether they measured the right things, did they even measure what they measured correctly? The number that concerns me is "percent of students with academic plans," which is defined in terms of the percentage that have lined up either a faculty position or a post-doctoral fellowship by graduations, and which is probably the most important variable of those they list in terms of measuring the success of a research program.

They find that no school has a rate of over 55% (Princeton). Harvard is at 26%. To put none to fine a point on it, hat's absurd. Occasionally our department sends out a list of who's graduating and what they are doing next. Unfortunately, I haven't saved any of them, but typically all but 1 or 2 people are continuing on to academic positions (there's often someone who is doing consulting instead, and occasionally someone who just doesn't have a job lined up yet). So the number should be closer to 90-95% -- not just at Harvard, but presumably at peer institutions.

This makes me worried about their other numbers. In any case, since the "student support" ranking is so heavily dependent on this particular variable, and that variable is clearly measured incorrectly, I don't think there's much point in looking at the "student support" ranking closely.