Games with Words: On changing one's mind

Showing posts with label On changing one's mind. Show all posts

Findings: The Role of World Knowledge in Pronoun Interpretation

Posted by GamesWithWords on Wednesday, May 01, 2013

A few months ago, I posted the results of That Kind of Person. This was the final experiment in a paper on pronoun interpretation, a paper which is now in press. You can find a PDF of the accepted version here.

How it Began

Isaac Asimov famously observed that "the most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" That quote describes this project fairly well. The project grew out of a norming study. Norming studies aren't really even real experiments -- they are mini experiments used to choose stimuli.

I was designing an ERP ("brain wave") study of pronoun processing. A group in Europe had published a paper using ERPs to look at a well-known phenomenon in pronoun interpretation, one which has been discussed a lot on this blog, in which pronoun interpretation clearly depends on context:

(1) Sally frightens Mary because she...

(2) Sally likes Mary because she...

Most people think that "she" refers to Sally in (1) but Mary in (2). This seems to be a function of the verbs in (1-2), since that's all that's different between the sentences, and in fact other verbs also affect pronoun interpretation. We wanted to follow up some of the previous ERP work, and we were just choosing sentences. You get nice big ERP effects (that is, big changes in the brain waves) when something is surprising, so people often compare sentences with unexpected words to those with expected words, which is what this previous group had done:

(3) Sally frightens Bill because she...

(4) Bill frightens Sally because she...

You should get the sense that the pronoun "she" is a bit more surprising in (4) than in (3). Comparing these sentences to (1-2) should make it clear why this is.

The Twist

A number of authors argued that what is going on is that these sentences (1-4) introduce an explanation ("because..."). As you are reading or listening to the sentence, you think through typical causes of the event in question (frightening, liking, etc.) and so come up with a guess as to who is going to be mentioned in the explanation. More good explanations of an instance of frightening involve the frightener than the frightenee, and more good explanations of an instance of liking involve the like-ee than the liker.

The authors supported the argument by pointing to studies showing that what you know about the participants in the event matters. In general, you might think that in any given event involving a king and a butler, kings are more likely to be responsible for the event simply because kings have more power. So in the following sentence, you might interpret the pronoun as referring to the king even though it goes against the "typical" pattern for frighten (preferring explanations involve the frightener).

(5) The butler frightened the king because...

What got people particularly excited about this is that it all has to happen very fast. Studies have shown that you can interpret the pronoun in such sentences in a fraction of a second. If you can do this based on a complex inference about who is likely to do what, that's very impressive and puts strong constraints on our theory of language.

The Problem

I was in the process of designing an ERP experiment to follow up a previous one in Dutch that I wanted to replicate in English. I had created a number of sentences, and we were running a simple experiment in which people rate how "natural" the sentences sound. We were doing this just to make sure none of our sentences were weird, since that -- as already mentioned -- can have been effects on the brain waves, which could swamp any effects of the pronoun. Again, we expected people to rate (4) as less natural than (3); what we wanted to make sure was that people didn't rate both (3) and (4) as pretty odd. We tested a couple hundred such sentences, from which we would pick the best for the study.

I was worried, though, because a number of previous studies had suggested that gender itself might matter. This follows from the claim that who the event participants are matters (e.g., kings vs. butlers). Specifically, a few studies had reported that in a story about a man and a woman, people expect the man to be talked about more than the woman, analogous to expecting references to the king rather than the butler in (5). Was this a confound?

I ran the study anyway, because we would be able to see in the data just how bad the problem was. To my surprise, there was no effect of gender at all. I started looking at the literature more carefully and noticed that several people had similarly failed to find such effects. One paper had found an effect, but it seemed to be present in only a small handful of sentences out of the large number they had tested. I looked into studies that had investigated sentences like (5) and discovered ... that they didn't exist! Rather, the studies researchers had been citing weren't about pronoun interpretation at all but something else. To be fair, some researchers had suggested that there might be a relationship between this other phenomenon and pronoun interpretation, but it had never been shown. I followed up with some experiments seeing whether the king/butler manipulation would affect pronoun interpretation, and it didn't. (For good measure, I also showed that there is little if any relationship between that other phenomenon and pronouns.)

A Different Problem

So it looked like the data upon which much recent work on pronouns is built was either un-replicable or apocryphal. However, the associated theory had become so entrenched, that this was a difficult dataset to publish. I ultimately had to run around a dozen separate experiments in order to convince reviewers that these effects really don't exist (or mostly don't exist -- there do seem to be a tiny percentage of sentences, around 5%, where you can get reliable if very small effects of gender). (A typical paper has 1-4 experiments, so a dozen is a lot. Just in order to keep the paper from growing to an unmanageable length, I combined various experiments together and reported each one as a separate condition of a larger experiment.)

Most of these experiments were run on Amazon Mechanical Turk, but the final one was run at GamesWithWords.org and was announced on this blog (read the results of that specific experiment here). The paper is now in press at Language & Cognitive Processes. You can read the final submitted version here.

Conclusion

So what does all this mean? In many ways, it's a correction to the literature. A lot of theoretical work was built around findings that turned out to be wrong or nonexistent. In particular, the idea that pronoun interpretation involves a lot of very rapid inferences based on your general knowledge about the world. That's not quite the same thing as having a new theory, but we've been exploring some possibilities that no doubt will be talked about more here in the future.
----

Joshua K. Hartshorne (2014). What is implicit causality? Language and Cognitive Processes

Fractionating IQ

Posted by GamesWithWords on Thursday, January 17, 2013

Near the dawn of the modern study of the mind, the great psychological pioneer Charles Spearman noticed that people who are good at one kind of mental activity tend to be good at most other good mental activities. Thus, the notion of g (for "general intelligence") was born: the notion that there is some underlying factor that determines -- all else equal -- how good someone is at any particular intelligent task. This of course fits folk psychology quite well: g is just another word for "smarts".

The whole idea has always been controversial, and many people have argued that there is more than one kind of smarts out there (verbal vs. numeric, logical vs. creative, etc.). Enter a recent paper by Hampshire and colleagues (Hampshire, HIghfield, Parkin & Owen, 2012) which tries to bring both neuroimaging and large-scale Web-based testing to bear on the question.

In the neuroimaging component, they asked sixteen participants to carry out twelve difficult cognitive tasks while their brains were scanned and applied principle components analysis (PCA) to the results. PCA is a sophisticated statistical method for grouping things.

A side note on PCA

If you already know what PCA is, skip to the next section. Basically, PCA is a very sophisticated way of sorting thigns. Imagine you are sorting dogs. The simplest thing you could do is have a list of dog breeds and go through each dog and sort it according to its breed.

What if you didn't already have dog breed manual? Well, German shepherds are more similar to one another than any given German shepherd is to a poodle. So by looking through the range of dogs you see, you could probably find a reasonable way of sorting them, "rediscovering" the various dog breeds in the process. (In more difficult cases, there are algorithms you could use to help out.)

That works great if you have purebreds. What if you have mutts? This is where PCA comes in. PCA assumes that there are some number of breeds and that each dog you see is a mixture of those breeds. So a given dog may be 25% German Shepherd, 25% border collie, and 50% poodle. PCA tries to "learn" how many breeds there are, the characteristics of those breeds, and the mixture of breeds that makes up each dog -- all at the same time. It's a very powerful technique (though not without its flaws).

Neuroimaging intelligence

Analysis focused only on the "multiple demands" network previously identified as being related to IQ and shown in red in part A of the graph below. PCA discovered two underlying components that accounted for about 90% of the variance in the brain scans across the twelve tasks. One was particularly important for working memory tasks, so the authors called in MDwm (see part B of the graph below), and it involved mostly the IFO, SFS and ventral ACC/preSMA (see part A below for locations). The other was mostly involved in various reasoning tasks and involved more IFS, IPC and dorsal ACC/preSMA.

Notice that all tasks involved both factors, and some tasks (like the paired associates memory task) involved a roughly equal portion of each.

Sixteen subjects isn't very many

The authors put versions of those same twelve tasks on the Internet. They were able to get data from 44,600 people, which makes it one of the larger Internet studies I've seen. The authors then applied PCA to those data. This time they got three components, two of which were quite similar to the two components found in the neuroimaging study (they correlated at around r=.7, which is a very strong correlation in psychology). The third component seemed to be particularly involved in tasks requiring language. Most likely that did not show up in the neuroimaging study because the neuroimaging study focused on the "multiple demands" network, whereas language primarily involves other parts of the brain.

The factors dissociated in other ways as well. Whereas people's working memory and reasoning abilities start to decline about the time people reach the legal drinking age in the US (coincidence?) verbal skills remain largely undiminished until around age 50. People who suffer from anxiety had lower than average working memory abilities, but average reasoning and verbal abilities. Several other demographic factors similarly had differing effects on working memory, reasoning, and verbal abilities.

Conclusions

The data in this paper are very pretty, and it was a particularly nice demonstration of converging behavioral and neuropsychological methods. I am curious what the impact will be. The authors are clearly arguing against a view on which there is some unitary notion of IQ/g. It occurred to me as I wrote this what while I've read many papers lately discussing the different components of IQ, I haven't read anything recent that endorses the idea of a unitary g. I wonder if there is anyone, and, if so, how they account for this kind of data. If I come across anything, I will post it here.

------

Hampshire, A., Highfield, R., Parkin, B., & Owen, A. (2012). Fractionating Human Intelligence Neuron, 76 (6), 1225-1237 DOI: 10.1016/j.neuron.2012.06.022

Maybe first-borns aren't smarter after all

Posted by GamesWithWords on Thursday, November 01, 2012

Although it is conventional wisdom that your birth order affects your personality, it's a hotly-disputed topic among scientists, and in fact my sense is that, if anything, a majority of researchers doubt the existence of birth order effects. Findings have been slippery: one study suggests that, for instance, first-borns are risk-takers, whereas another suggests that they aren't.

Birth Order & Intelligence

One of the most-researched topics has been intelligence: A wide variety of studies have suggested that first-borns have higher IQ scores than later-borns. While not every study has shown this, Bjerkedal and colleagues published in 2007 what seemed to be the definitive proof. They looked at IQ tests for 250,000 Norwegian male conscripts born from 1967 to 1988 -- that's more than 80% of all Norwegian men born in that time period -- and found first-born sons have IQs of about 2.3 points higher than second-born sons.

Because of the size and completeness of this dataset, they were able to rule out various possible confounds in the data that have been sources of controversy in previous studies. For instance, because wealthy, well-educated families rarely have more than two children, simply being a middle child correlates with being less wealthy and having less access to quality education (and health care, etc.). So one might find that middle children have lower IQs, when in fact what you are measuring is not an effect of birth order, but of socio-economic status. Bjerkedal and colleagues were able to control for such factors.

The Flynn Effect

But, as Satoshi Kanazawa of the London School of Economics points out in a recent paper, there was one confound that they didn't consider: the Flynn Effect. Over the last hundred years -- and possibly longer -- the average person has been doing better and better on IQ tests. In fact, this is something that Bjerkedal and colleagues noticed in their own data, with IQ scores rising slightly from 1984 (the first year of their study) to the mid 1990s.

Because of this, IQ test manufacturers have been constantly raising the bar: you have to get more questions right to get an IQ of 100 now than you did fifty years ago. (What has caused the Flynn effect is one of the Big Questions in current research and a topic for a much longer post.) And Bjerkedal and colleagues did the same thing:

To minimize these variations, scores were standardized by calculating deviations from an overall mean score of 5.00 for each calendar year and age.

The idea is that your score is based not on how many questions you got right, but how many questions you got right compared with everyone else who took the IQ test in the same year. Kanazawa points out that this is a confound: The average performance was higher in the 1990s than in the 1980s. So if two people who took the test in 1985 and 1995 answered the exact same questions correct, the one who took it in 1995 would have a lower IQ than the one who took it in 1985. This means that if you compare two siblings, the older sibling will -- all else equal -- have a higher IQ score than the younger sibling.

Caveats

There is one limitation to Kanazawa's story. While Bjerkedal and colleagues report that the average score did increase from 1985 through the early 1990s, they report that the scores then decreased back down to the original level between 1998 and 2002 (the study ended in 2004). Also, the increase was very small (one 1 IQ point) compared to the birth order effect that they reported (a drop of 1-2 IQ points for each older brother). So whether the Flynn effect is sufficient to explain away the Bjerkedal results is hard to say.*

Nonetheless, Kanazawa has one more card up his sleeve: his own study, Kanazawa looked at un-scaled data from IQ tests given to 17,419 children in the UK, finding no effect of birth order on intelligence.

That said, the statistical analyses are complicated, involving several transformations. While the transformations seems reasonable (mostly PCA), the transformations Bjerkedal used also seemed reasonable until we realized that they weren't. I'd like to see that Kanazawa's null effect holds up on the truly raw data as well.

Conclusions

Birth order effects are interesting scientifically because they get at the following question: How does your home environment affect the person you become, if at all? Many of the leading minds today suspect that your home environment has little to no effect on you, at least not in the long term. Birth order effects are a very useful test case. Relatively little theoretical rides on whether oldest siblings are the smartest or youngest siblings are the smartest, but if you could show that birth order affected intelligence, that would be a proof-of-concept that home environment affects the adult you become.

[BTW Nobody doubts that home environment has a strong impact on future income, level of educational achievement, etc. The question is whether it affects your personality, making you introverted or extroverted, etc.]

If the intelligence data do not hold up, that leaves -- to my knowledge -- no direct measures of personality or cognitive function for which we have solid evidence that they are affected by birth order. There is one indirect measure that, to my knowledge, has never been challenged: people tend to be friends with and marry others of the same birth order (some of the evidence came from studies run at gameswithwords.org -- thank you to all who participated). Since we know that people marry others with similar personalities (on average), a plausible explanation is that people with similar birth order have similar personalities, leading them to marry one another. However, the fact that no one has thought of another explanation doesn't mean that there isn't one. Time will tell.

See also: My review of birth order effects for SciAm Mind from 2010.

*Bjerdekal and colleagues renormalized a 9-point scaled score. I cannot tell from the article whether than 9-point scale itself was based on standardized norms -- though most likely it was -- and whether those norms were re-standardized during the 21 years of the study.

------

Kanazawa, S. (2012). Intelligence, Birth Order, and Family Size Personality and Social Psychology Bulletin, 38 (9), 1157-1164 DOI: 10.1177/0146167212445911

NSF fellows can teach again

Posted by GamesWithWords on Friday, May 20, 2011

I reported last month that NSF was no longer allowing its graduate fellows to teach. According to an email I received earlier today, they are reconsidering the issue:

Each Fellow is expected to devote full time to advanced scientific study or work during tenure. However, because it is generally accepted that teaching or similar activity constitutes a valuable part of the education and training of many graduate students, a Fellow may undertake a reasonable amount of such activities, without NSF approval. It is expected that furtherance of the Fellow's educational objectives and the gain of substantive teaching or other experience, not service to the institution as such, will govern these activities. Compensation for such activities is permitted based on the affiliated institution’s policies and the general employment policies outlined in this document.

Field of Science

Findings: The Role of World Knowledge in Pronoun Interpretation

Fractionating IQ

Maybe first-borns aren't smarter after all

NSF fellows can teach again