Games with Words

Is psychology a science, redux

Posted by GamesWithWords on Tuesday, August 31, 2010

Is psychology a science? I see this question asked a lot on message boards, and it's time to discuss it again here. I think the typical response by a researcher like myself is an annoyed "of course, you ignoramus." But a more subtle response is deserved, as the answer depends entirely on what you mean by "psychology" and what you mean by "science."

Two Psychologies

First, if by "psychology" you mean seeing clients (like in Good Will Hunting or Silence of the Lambs), then, no, it's probably not a science. But that's a bit like asking whether engineers or doctors are scientists. Scientists create knowledge. Client-visiting psychologists, doctors and engineers use knowledge. Of course, you could legitimately ask whether client-visiting psychologists base their interventions on good science. Many don't, but that's also true about some doctors and, I'd be willing to bet, engineers.

Helpfully, "engineering" and "physics" are given different names, while the research and application ends of psychology confusingly share the same name. (Yes, I'm aware that engineering is not hte application of physics writ broadly -- what's the application of string theory? -- and one can be a chemical engineer, etc. I actually think that makes the analogy to the two psychologies even more apt). It doesn't help that the only psychologists who show up in movies are the Good Will Hunting kind (though if paleoglaciologists get to save the world, I don't see why experimental psychologists don't!), but it does exist.

A friend of mine (a physicist) once claimed psychologists don't do experiments (he said this un-ironically over IM while I was killing time in a psychology research lab). My response now would be to invite him to participate in one of these experiments. Based on this Facebook group, I know I'm not the only one who has heard this.

Methods

There are also those, however, who are aware that psychologists do experiments, but deny that it's a true science. Some of this has to do with the belief that psychologists still use introspection (there are probably some somewhere, but I suspect there are also physicists who use voodoo dolls somewhere as well, along with mathematicians who play the lottery).

The more serious objection has to do with the statistics used in psychology. In the physical sciences, typically a reaction takes place or does not, or a neutrino is detected is not. There is some uncertainty given the precision of the tools being used, but on the whole the results are fairly straight-forward and the precision is pretty good (unless you study turbulence or something similar).

In psychology, however, the phenomena we study are noisy and the tools lack much precision. When studying a neutrino, you don't have to worry about whether it's hungry or sleepy or distracted. You don't have to worry about whether the neutrino you are studying is smarter than average, or maybe too tall for your testing booth, or maybe it's only participating in your experiment to get extra credit in class and isn't the least bit motivated. It does what it does according to fairly simple rules. Humans, on the other hand, are terrible test subjects. Psychology experiments require averaging over many, many observations in order to detect patterns within all that noise.

Science is about predictions. In theory, we'd like to predict what an individual person will do in a particular instance. In practice, we're largely in the business of predicting what the average person will do in an average instance. Obviously we'd like to make more specific predictions (and there are those who can and do), but they're still testable (and tested) predictions. The alternative is to declare much of human and animal behavior outside the realm of science.

Significant differences

There are some who are on board so far but get off the bus when it comes to how statistics are done in psychology. Usually an experiment consists of determining statistically whether a particular result was likely to have occurred by chance alone. Richard Feynman famously thought this was nuts (the thought experiment is that it's unlikely to see a license plate printed CPT 349, but you wouldn't want to conclude much from it).

That's missing the point. The notion of significant difference is really a measure of replicability. We're usually comparing a measurement across two populations. We may find population A is better than population B on some test. That could be because population A is underlyingly better at such tests. Alternatively, population A was lucky that day. A significant difference is essentially a prediction that if we test population A and population B again, we'll get the same results (better performance for population A). Ultimately, though, the statistical test is just a prediction (one that typically works pretty well) that the results will replicate. Ideally, all experiments would be replicated multiple times, but that's expensive and time-consuming, and -- to the extent that the statistical analysis was done correctly (a big if) -- largely unnecessary

So what do you think? Are the social sciences sciences? Comments are welcome.

Why is learning a language so darn hard (golden oldie)

Posted by GamesWithWords on Monday, August 23, 2010

I work in an toddler language lab, where we study small children who are breezing through the process of language acquisition. They don't go to class, use note cards or anything, yet they pick up English seemingly in their sleep (see my previous post on this).

Just a few years ago, I taught high school and college students (read some of my stories about it here) and the scene was completely different. They struggled to learn English. Anyone who has tried to learn a foreign language knows what I mean.

Although this is well-known, it's a bit of mystery why. It's not the case that my Chinese students didn't have the right mouth shape for English (I've heard people -- not scientists -- seriously propose this explanation before). It's also not just that you can learn only one language. There are plenty of bilinguals out there. Jesse Snedeker (my PhD adviser as of Monday) and her students recently completed a study of cross-linguistic late-adoptees -- that is, children who were adopted between the ages of 2 and 7 into a family that spoke a different language from that of the child's original home or orphanage. In this case, all the children were from China. They followed the same pattern of linguistic development -- both in terms of vocabulary and grammar -- as native English speakers and in fact learned English faster than is typical (they steady caught up with same-age English-speaking peers).

So why do we lose that ability? One model, posited by Michael Ullman at Georgetown University (full disclosure: I was once Dr. Ullman's research assistant), has to do with the underlying neural architecture of language. Dr. Ullman argues that basic language processes are divided into vocabulary and grammar (no big shock there) and that vocabulary and grammar are handled by different parts of the brain. Simplifying somewhat, vocabulary is tied to temporal lobe structures involved in declarative memory (memory for facts), while grammar is tied to procedural memory (memory for how to do things like ride a bicycle) structures including the prefrontal cortex, the basal ganglia and other areas.

As you get older, as we all know, it becomes harder to learn new skills (you can't teach an old dog new tricks). That is, procedural memory slowly loses the ability to learn new things. Declarative memory stays with us well into old age, declining much more slowly (unless you get Alzheimer's or other types of dementia). Based on Dr. Ullman's model, then, you retain the ability to learn new words but have more difficulty learning new grammar. And grammar does appear to be the typical stumbling block in learning new languages.

Of course, I haven't really answered my question. I just shifted it from mind to brain. The question is now: why do the procedural memory structures lose their plasticity? There are people studying the biological mechanisms of this loss, but that still doesn't answer the question we'd really like to ask, which is "why are our brains constructed this way?" After all, wouldn't it be ideal to be able to learn languages indefinitely?

I once put this question to Helen Neville, a professor at the University of Oregon and expert in the neuroscience of language. I'm working off of a 4-year-old memory (and memory isn't always reliable), but her answer was something like this:

Plasticity means that you can easily learn new things. The price is that you forget easily as well. For facts and words, this is a worthwhile trade-off. You need to be able to learn new facts for as long as you live. For skills, it's maybe not a worthwhile trade-off. Most of the things you need to be able to do you learn to do when you are relatively young. You don't want to forget how to ride a bicycle, how to walk, or how to put a verb into the past tense.

That's the best answer I've heard. But I'd still like to be able to learn languages without having to study them.

originally posted 9/12/07

Tenure, a dull roar

Posted by GamesWithWords on Tuesday, August 17, 2010

Slate ran an unfortunate, bizarre piece on tenure last week. FemaleScienceProfessor has a good take-down. Among problems, it repeats the claim that the average tenured professor costs the average university around $11,000,000 across his/her career -- a number that is either misleading, miscalculated, or (most likely) an outright lie. But, as FemaleScienceProfessor points out, tenure itself costs next to nothing, so anyone who says eliminating tenure will save money really means cutting professor salaries will save money but doesn't want to be on the record saying so.

If this seems like deja vu, it is. I just wrote a post about a similarly confused feature in the New York Times. That post is still worth reading (imho).

Which raises the question of why tenure is under attack. I have two guesses: 1) it's a way of ignoring the progressive defunding of public universities, or 2) part of the broader war on science. There are possibly a few people who genuinely think tenure is a bad idea, but not because eliminating it will save money (it won't), because it'll soften the publish-or-perish ethos (yes, the claim has been made), or because it'll refocus universities on teaching (absurd, irrelevant, and beside the point). Which leaves concerns about an inflexible workforce and the occasional dead-weight professor, but that's not on my list of top ten problems in education, and I don't think it should be on anyone else's -- there are bigger fish to fry.

Making data public

Posted by GamesWithWords on Monday, August 16, 2010

Lately, there have been a lot of voices (e.g., this one) calling for scientists to make raw data immediately available to the general public. In the interest of answer than call, here's some of my raw data:

female	no	English	English	no	no	yes	United States	1148478773	312	0	helped	1	daxed	59	0	1	1
female	no	English	English	no	no	yes	United States	1148478773	312	1	heard	2	blied	33	0	0	2
female	no	English	English	no	no	yes	United States	1148478773	312	2	decelerated	2	lenked	45.4	1	0	2
female	no	English	English	no	no	yes	United States	1148478773	312	3	startled	1	gamped	31.1	1	0	3
female	no	English	English	no	no	yes	United States	1148478773	312	4	prompted	2	henterred	59	0	1	4
female	no	English	English	no	no	yes	United States	1148478773	312	5	engrossed	2	nazored	31.1	0	1	5
female	no	English	English	no	no	yes	United States	1148478773	312	6	obliged	1	ablined	59	1	0	6
female	no	English	English	no	no	yes	United States	1148478773	312	7	tantalized	2	bosined	31.1	1	1	7
female	no	English	English	no	no	yes	United States	1148478773	312	8	bled for	1	breened	31.3	1	1	8
female	no	English	English	no	no	yes	United States	1148478773	312	9	loathed	2	gaubled	31.2	0	0	9
female	no	English	English	no	no	yes	United States	1148478773	312	10	mourned for	1	ginked	31.3	1	1	10
female	no	English	English	no	no	yes	United States	1148478773	312	11	wounded	2	jarined	31.1	0	0	10

Do you feel enlightened? Probably not. Raw data isn't all that useful if you don't know how it was collected, what the different numbers refer to, etc. Even if I told you this is data from this experiment, that probably wouldn't help much. Even showing you the header rows for these data will help only so much:

sex

subject_already

nat_language

prime_language

autism

dyslexia

psychiatric

country

randomID

startTime

trial

word

choice

conclusion

wordClass

whichLocation

because

totalCorrect

Some things are straightforward. Some are not. It's important to know that I record data with a separate row for every trial, so each participant has multiple trials. Also, I record all data, even data from participants who did not complete the experiment. If you're unaware of that, your data analyses would come out very wrong. Also I have some codes I use to mark that the participant is an experimenter checking to make sure everything is running correctly. You'd need to know those. It's key to know how responses are coded (it's not simply "right" or "wrong" -- and in fact the column called totalCorrect does not record whether the participant got anything correct).

The truth is, even though I designed this study myself and wrote the program that outputs the data, every time I go back to data from a study I haven't worked with in a while, it takes me a few hours to orient myself -- and I'm actually relatively good about documenting my data.

So if a law were passed -- as some have advocated for -- requiring that data be made public, one of two things will happen: either people will post uninterpretable data like my mini-spreadsheet above, or they'll spend huge amounts of time preparing their data for others' consumption. The former will help no one. And the latter is expensive, and someone has to pay for that. And this all has to be balanced against the fact that there are very few data sets anyone would want to reanalyze.

There are important datasets that should be made available. And in fact there are already mechanisms for doing this (in my field, CHILDES is a good example). This kind of sharing should be encouraged, but mandated sharing is likely to cause more problems than it solves.

Anonymity

Posted by GamesWithWords on Tuesday, August 10, 2010

It seems that most science bloggers use pseudonyms. To an extent, I do this, though it's trivial for anyone who is checking to figure out who I am (I know, since I get emails sent to my work account from people who read the blog). This was a conscious choice, and I have my reasons.

1. I suppose one would choose anonymity just in case one's blogging pissed off people who are in a position to hurt you. That would be mostly people in your own field. Honestly, I doubt it would take anyone in my field long to figure out what university I was at. Like anyone, I write most about the topics my friends and colleagues are discussing, and that's a function of who my friends and colleagues are.

(In fact, a few years ago, someone I knew was able to guess what class I was taking, based on my blog topics.)

2. I write a lot about the field, graduate school, and the job market. But within academia, every field is different. For that matter, even if you just wanted to discuss graduate student admission policy within psychology, the fact is that there is a huge amount of variation from department to department. So I can really only write about my experiences. For you to be able to use that information, you have a have a sense of what kind of school I'm at (a large, private research university) and in what field (psychology).

I read a number of bloggers who write about research as an institution, about the job market, etc., but who refuse to say what field they're in. This makes it extremely difficult to know what to make of what they say.

For instance, take my recent disagreement with Prodigal Academic. Prodigal and some other bloggers were discussing the fact that few people considering graduate school in science know how low the odds of getting a tenure-track job are. I suggested that actually they aren't misinformed about academia per se, but about the difference between a top-tier school and even a mid-tier school. I point out that at a top-tier psychology program, just about everybody who graduates goes on to get a tenure-track job. Prodigal says that in her field, at least, that's not true (and she suspects it's not true in my field, either).

The difference is that you can actually go to the websites of top psychology programs and check that I'm right. We can't do the same for Prodigal, because we have no idea what field she's in. We just have to take her word for it.

3. I suspect many people choose pseudonyms because they don't want to censor what they say. They don't want to piss anybody off. I think that to maintain my anonymity, I would have to censor a great deal of what I say. For one thing, I couldn't blog about the one thing I know best: my own work.

There is the risk of pissing people off. And trust me, I worry about it. But being careful about not pissing people off is probably a good thing, whether you're anonymous or know. Angry people rarely change their minds, and presumably we anger people precisely when we disagree with them.

--------

So why don't I actually blog under my name? I want people who Google me by name to find my academic website and my professional work first, not the blog.

Joining Twitter. Sigh.

Posted by GamesWithWords on Monday, August 09, 2010

The last few weeks I've been making some changes at this blog. One is to write fewer but higher-quality posts. Hopefully you noticed the latter and not just the former. At the same time, I have been finding more and more articles and posts that demand sharing, but about which I have little or nothing to say, except that you should read it. This has led me to add a twitter feed above the posts. You can read there or follow directly.

We'll see how it goes. Feedback is welcome. After all, I do this for the audience.

UPDATED

Another change: This blog is *relatively* new to FieldOfScience, but posts go back to 2007. Some of those older posts are worth revisiting, and I'll be reposting (occasionally with updates) a few of the better ones from time to time under the label "golden oldies". Again, if people having feelings about this, let me know.

1/3 of Americans can't speak?

Posted by GamesWithWords on Wednesday, August 04, 2010

A number of people have been blogging about a recent, still unpublished study suggesting that "a significant proportion of native English speakers are unable to understand some basic sentences." Language Log has a detailed explanation of the methods, but in essence participants were asked to match sentences to pictures. A good fraction made large numbers of mistakes, particularly those who had been high-school drop-outs.

What's going on here? To an extent, this shouldn't be that surprising. We all know there are people who regularly mangle language. But, as Mark Liberman at Language Log points out, at least some of these data are no doubt ascribable to the "paper airplane effect":

At one point we thought we had discovered that a certain fraction of the population is surprisingly deaf to certain fairly easy speech-perception distinctions; the effect, noted in a population of high-school-student subjects, was replicable; but observing one group of subjects more closely, we observed that a similar fraction spent the experiment surreptitiously launching paper airplanes and spitballs at one another.

It's worth remembering that, while many participants in an experiment take it seriously and are happy to help out the researcher, some are just there for the money they get paid. Since we're required to pay people whether they pay attention to the experiment or not, they really don't have any incentive to try hard. Does it surprise anyone that high-school drop-outs are particularly likely to be bad at/uninterested in taking tests?

It's probably relevant that the researchers involved in this study are linguists. There are some linguists who run fabulous experiments, but as a general rule, linguists don't have much training in doing experiments or much familiarity with what data looks like. So it's not surprising that the researchers in question -- and the people to whom they presented the data -- weren't aware of the paper airplane effect.

(I should say that psychology is by no means immune to this problem. Whenever a new method is adopted, it takes a while before there's a critical mass of people who really understand it, and in the meantime a lot of papers with spurious conclusions get written. I'm thinking of fMRI here.)

Honestly, Research Blogging, Get over yourself

Posted by GamesWithWords on Monday, August 02, 2010

A few years ago, science blog posts started decorating themselves with a simple green logo. This logo was meant to credential the blog post as being one about peer-reviewed research, and is supplied by Research Blogging. As ResearchBlogging.org explains:

ResearchBlogging.org is a system for identifying the best, most thoughtful blog posts about peer-reviewed research. Since many blogs combine serious posts with more personal or frivolous posts, our site offers away to find only the most carefully-crafted about cutting-edge research, often written by experts in their respective fields.

That's a good goal and one I support. If you read further down, you see that this primarily amounts to the following: if the post is about a peer-reviewed paper, it's admitted to the network. If it's not, it isn't. I guess the assumption is that the latter is not carefully-crafted or about cutting-edge research. And that's where I get off the bus.

Peer Review is Not Magic

One result of the culture wars is that scientists have needed a way of distinguishing real data from fantasy. If you look around the Internet, no doubt half or even more than half of what is written suggests there's no global warming, that vaccines cause autism, etc. Luckily, fanatics rarely publish in peer-reviewed journals, so once we restrict the debate to what is in peer-reviewed journals, pretty much all the evidence suggests global warming, no autism-vaccine link, etc. So pointing to peer-review is a useful rhetorical strategy.

That, at least, is what I assume has motivated all the stink about peer-review in recent years, and ResearchBlogging.org's methods. But it's out of place in the realm of science blogs. It's useful to think about what peer review is.

A reviewer for a paper reads the paper. The reviewer does not (usually) attempt to replicate the experiment. The reviewer does not have access to the data and can't check that the analyses were done correctly. At best, the reviewer evaluates the conclusions the authors draw, and maybe even criticizes the experimental protocol or the statistical analyses used (assuming the reviewers understand statistics, which in my field is certainly not always the case). But the reviewer doesn't can't check that the data weren't made up, that the experimental protocol was actually followed, that there were no errors in data analysis, etc.

In other words, the reviewer can do only and exactly what a good science blogger does. So good science blogging is, at its essence, a kind of peer review.

Drawbacks

Now, you might worry about the fact that the blogger could be anyone. There's something to that. Of course, ResearchBlogging.org has the same problem. Just because someone is blogging about peer-reviewed paper doesn't mean they understand it (or that they aren't lying about it, which happens surprisingly often with the fluoride fanatics).

So while peer review might be a useful way of vetting the paper, it won't help us vet the blog. We still have to do that ourselves (and science bloggers seem to do a good job of vetting).

A weakness

Ultimately, I think it's risky to put all our cards on peer review. It's a good system, but its possible to circumvent. We know that some set of scientists read the paper and thought it was worth publishing (with the caveats mentioned above). Of course, those scientists could be anybody, too -- it's up to the editor. So there's nothing really stopping autism-vaccine fanatics from establishing their own peer-reviewed journal, with reviewers who are all themselves autism-vaccine fanatics.

To an extent, that already happens. As long as there's a critical mass of scientists who think a particular way, they can establish their own journal, submit largely to that journal and review each other's submissions. Thus, papers that couldn't have gotten published at a more mainstream journal can get a home. I think anyone who has done a literature search recently knows there are a lot of bad papers out there (in my field, anyway, though I imagine the same is true in others).

Peer review is a helpful vetting process, and it does make papers better. But it doesn't determine fact. That is something we still have to find for ourselves.

****
Observant readers will have noticed that I use ResearchBlogging.org myself for it's citation system. What can I say? It's useful.

Help! I need data!

Posted by GamesWithWords on Friday, July 30, 2010

Data collection keeps plugging along at GamesWithWords.org. Unfortunately, as usual, it's not the experiments for which I most need data that get the most traffic. Puntastic had around 200 participants in the last month. I'd like to get more than that, and I'd like to get more than that in all my experiments. But if I had to choose one to get 200 participants, it would be The Video Test, which only got 17.

The Video Test is the final experiment in a series that goes back to 2006. We submitted a paper in 2007, which was rejected. We did some follow-up experiments and resubmitted. More than once. Personally, I think we've simply had bad luck with reviewers, since the data are pretty compelling. Anyway, we're running one last monster experiment, replicating all our previous conditions several which-ways. It needs about 400 participants, though for really beautiful data I'd like about 800. We've got 140.

As I said, recruitment has been slow for this experiment.

So... if you have never done this experiment before (it involves watching a video and taking a memory test), please do. I'd love to get this project off my plate.

I liked "Salt," but...

Posted by GamesWithWords on Wednesday, July 28, 2010

What's with movies in which fMRI can be done remotely. In an early scene, the CIA do a remote brain scan of someone sitting in a room. And it's fully analyzed, too, with ROIs shown. I want that technology -- it would make my work so much easier!

UPDATE I'm not the only one with this complaint. Though Popular Mechanics goes a bit easy on the movie by saying fMRI is "not quite at the level Salt portrays." That's a bit like saying space travel is not quite at the level Star Trek portrays. There may someday be a remote brain scanner, but it won't be based on anything remotely like existing fMRI technology, which requires incredibly powerful, supercooled and loud magnets. Even if you solved the noise problems, there's nothing to be done about the fact that the knife embedded in the Russian spy's shoe (yes -- it is that kind of movie) would have gone flying to the center of the magnetic field, along with many of the other metal objects in the room.

What are the best cognitive science blogs?

Posted by GamesWithWords on Tuesday, July 27, 2010

If you look to your right, you'll see I've been doing some long-needed maintenance to my blog roll. As before, I'm limiting it to blogs that I actually read (though not all the blogs I read), and I have it organized by subject matter. As I did this, I noticed that the selection of cognitive science and language blogs is rather paltry. Most of the science blogs I read -- including many not included in the blog rolls -- are written by physical scientists.

Sure there are more of them than us, but even so it seems there should be more good cognitive science and language blogs. So I'm going to crowd-source this and ask you, dear readers, who should I be reading that I'm not?

Language Games

Posted by GamesWithWords on Thursday, July 22, 2010

Translation Party

Idea: type in sentence in English. The site then queries Google Translator, translating into Japanese and then back again until it reaches "equilibrium," where the sentence you get out is the sentence you put in. Some sentences just never converge. Ten points to whoever finds the most interesting non-convergence.

Sounds of Silence

Posted by GamesWithWords on Wednesday, July 21, 2010

My lament that, with regards to discussion of education reform, a trace of small liberal arts colleges has disappeared into the ether appears to have, itself, disappeared into the ether. Seriously, readers, I expected some response to that one. There are parts of my post even I disagree with.

No tenure, no way!

Posted by GamesWithWords on Tuesday, July 20, 2010

The New York Times is carrying an interesting but misguided discussion of tenure today. As usual, the first commentator warns that without tenure, academic freedom will die:

As at-will employees, adjunct faculty members can face dismissal or nonrenewal when students, parents, community members, administrators, or politicians are offended at what they say. If you can be fired tomorrow, you do not really have academic freedom. Self-censorship often results.

Mark Taylor of Columbia replies, essentially, "oh yah?"

To those who say the abolition of tenure will make faculty reluctant to be demanding with students or express controversial views, I respond that in almost 40 years of teaching, I have not known a single person who has been more willing to speak out after tenure than before.

Instead, tenure induces stasis, a point to which Richard Vedder, an economist at Ohio University, agrees:

The fact is that tenured faculty members often use their power to stifle innovation and change.

Money

You might, reading through these discussions, almost think that universities have been slowly doing weakening the tenure system because they want to increase diversity, promote a flexible workforce, and reduce the power of crabby old professors. Maybe some administrators do feel that way. But lurking behind all of this discussion is money. Here's Taylor:

If you take the current average salary of an associate professor and assume this tenured faculty member remains an associate professor for five years and then becomes a full professor for 30 years, the total cost of salary and benefits alone is $12,198,578 at a private institution and $9,992,888 at a public institution.

I'm not sure where he's getting these numbers. The numbers at Harvard for the same period is $6,320,500 for salary alone. Assuming benefits cost as much as the salary alone gets us up to our $12,000,000, but that's for Harvard, not the average university. Perhaps Taylor is assuming the professor starts today and includes inflation in future salaries, but 35 years of inflation is a lot. I'm using present-day numbers and assuming real salaries remain constant.

In any case, money seems to be the real factor, mentioned by more or less all the contributors. Here's Vedder:

My academic department recently granted tenure to a young assistant professor. In so doing, it created a financial liability of over two million dollars, because it committed the institution to providing the individual lifetime employment. With nearly double digit unemployment and universities furloughing and laying off personnel, is tenure a luxury we can still afford?

Adrianna Kezar of USC notes that non-tenured faculty are often not given offices or supplies, which presumably also saves the university money.

Professors make choices, too.

So universities save a lot of money by eliminating tenure. And certainly universities need to find savings where they can. What none of the contributors to the discussion acknowledge, beyond an oblique aside by Vedder, is that tenure has a financial value to professors as well as universities. Removing tenure in a sense is a pay cut, and both present and potential academics will respond to that pay cut.

Becoming a professor is not a wise financial decision. The starting salary of a lawyer leaving a top law school is greater than what most PhDs from the same schools will make at the height of their careers should they stay in academia. And lawyers' salaries, as I'm often reminded, can be similarly dwarfed by people with no graduate education that go straight into finance.

Most of us who nonetheless go into academia do so because we love it. The point is that we have options. Making the university system less attractive will mean fewer people will want to go into it. It's really that simple.

Garbage in, Garbage out

Posted by GamesWithWords on Tuesday, July 20, 2010

While watching television, have you ever had a fatal heart attack?

If you answered "yes" to this question, you would have been marked as a "bad participant" in Experimental Turk's recent study. The charitable assumption would be that you weren't paying attention. Importantly for those interested in using Amazon Mechanical Turk for research, participants recruited through AMT were no more likely to answer "yes" than participants tested in a traditional lab-based setting (neither group was likely to say "yes").

It's a nice post, though I think that Experimental Turk's analysis is over-optimistic, for reasons that I'll explain below. More interesting, though, is that Experimental Turk apparently does not always include such catch trials in their experiments. In fact, they find the idea so novel that they actually cited a 2009 paper from the Journal of Experimental Social Psychology that "introduces" the technique -- which means the editors and reviewers at this journal were similarly impressed with the idea.

That's surprising.

Always include catch trials

Including catch trials is often taught as basic experimental method, and for good reason. As Experimental Turk points out, you never know if your participants are paying attention. Inevitably, some aren't -- participants are usually paid or given course credit for participation, so they aren't always very motivated. Identifying and excluding the apathetic participants can clean up your results. But that's not the most important reason to include catch trials.

Even the best participant may not understand the instructions. I have certainly run experiments in which the majority of the participants interpreted the instructions differently from how I intended. A good catch trial is designed such that the correct answer can only be arrived at if you understand the instructions. It is also a good way of making sure you're analyzing your data correctly -- you'd be surprised how often a stray negative sign worms its way into analysis scripts.

Sometimes participants also forget instructions. In a recent study, I wasn't finding a difference between the control and experimental groups. I discovered in debriefing that most of the participants in the experimental group had forgotten the key instruction that made the experimental group the experimental group. No wonder there wasn't a difference! And good thing I asked.

The catch trial -- the question with the obvious answer -- is just one tool in a whole kit of tricks used to validate one's results. There are other options, too. In reading studies, researchers often ask comprehension questions -- not because the answers themselves matter (the real interest is in what the participants do while reading), but simply to prove that the participants in fact did read and understand the material.

Similar is the embedded experiment -- a mini experiment embedded into your larger experiment, the only purpose of which is to replicate a well-established result. For instance, in a recent experiment I included a vocabulary test (which you can also find in this experiment I'm running with Laura Germine at TestMyBrain.org). I also asked the participants for their SAT scores (these were undergraduates), not because I cared about their scores per se, but I was able to show that their Verbal SAT scores correlated nicely with performance on the vocabulary test (Math SAT scores less so), helping to validate the our vocab test.

Beyond Surveys

Although I described catch trials mostly in terms of survey-format studies, the same techniques can be embedded into nearly any experiment. I've used them for reading-time, eye-tracking and ERP experiments as well. The practice isn't even specific to psychology/cognitive science. During my brief sojourn in a wet lab in high school, my job was to help genotype genetic knock-out mice to make sure that the genes in question really were missing from the relevant mice and not from the control mice. It probably wouldn't have occurred to the PIs in that lab to just assume the knock-out manipulation worked. Fail that, and none of the rest of the experiment is interpretable.

A version of the catch trial is even seen in debugging software, where the programmer inserts code that isn't relevant to the function of the program per se, but the output of which helps determine whether the code is doing what it's supposed to.

It is true that some experiments resist checks of this sort. I have certainly run experiments where by design I couldn't easily confirm that the participants understood the experiment, were paying attention, etc. But that is better avoided if possible -- which is why when I don't see such checks in an experimental write-up, I assume either (a) the checks were performed but deemed too unimportant/obvious to mention, or (b)

An Odd Omission

If catch trials are a basic aspect of good experimental design, how is it that Experimental Turk and the Journal of Experimental Social Psychology didn't know about it? I'm not sure. Part of it may be due to how experimental design is taught. It's not something you look up in an almanac, and though there are classes on technique (at least in psychology departments), they aren't necessarily that helpful since there are hundreds of types of experiments out there, each of which has its own quirks, and a class can only cover a few.

At least in my experience, experimental design is learned through a combination of the apprenticeship method (working with professors -- or, more often, more experienced graduate students) and figuring it out for yourself. The authors at Experimental Turk, it turns out, come from fields relatively new to experimental design (business, management, and political science), so it's possible they had less access to such institutional knowledge.

As far as the Journal of Experimental Social Psychology... I'm not a social psychologist, and I hesitate to generalize about the field. A lot of social psychology uses questionnaires as instruments. They go to a great deal of difficulty to validate the questionnaires -- show that they are predictive of results on other tests or questionnaires, show that the questionnaires have good test-retest reliability, etc. Many of the techniques they use are ones I would like to learn better. But I actually haven't ever run across one (again, in my limited experience) that actually includes catch trials. Which in itself is interesting.

A clever idea

I should add that while Experimental Turk cites said journal article for suggesting using questions with obvious answers, that's not actually what the paper suggests. Rather, it suggests using instructions telling participants to ignore certain questions. For instance:

Sports Participation

Most modern theories of decision making recognize the fact that decisions do not take place in a vacuum. Individual preferences and knowledge, along with situational variables can greatly impact the decision process. In order to facilitate our research on decision making we are interested in knowing certain factors about you, the decision maker. Specifically, we are interested in whether you actually take the time to read the directions; if not, then some of our manipulations that rely on changes in the instructions will be ineffective. So, in order to demonstrate that you have read the instructions, please ignore the sports item below, as well as the continue button. Instead, simply click on the title at the top of this screen (i.e., "sports participation") to proceed to the next screen. Thank you very much.

That's a clever idea. One of my elementary school teachers actually wrote a whole test with instructions like that to teach the class a lesson about reading instructions carefully (and it worked -- I still do!). So it's a good idea I've never seen used in an experimental setting before, but that doesn't mean it hasn't been used. In any case, the discussion in the paper doesn't mention catch trials or other methods of validating data, so it's hard to know whether they did a thorough literature search.

More training

A bad movie can still make entertaining watching. A bad experiment is irredeemable. If the participants didn't understand the instructions, nothing can be gleaned from the data. And there are so many ways to run bad experiments -- I know, because I've employed many of them myself. There are a lot of datasets out there in psychology that have proven, shall we say, resistant to replication. Some of this has to be due to the fact that experimental design is not as good as it could and should be.

Addendum

As I mentioned higher up, I think Experimental Turk is overly optimistic about the quality of data from AMT. I've run a couple dozen experiments on AMT now, and the percentage of participants that fail the catch trials varies a great deal, from as few as 0% to as many as 20-30%. I haven't made a systematic study of it, but there seem to be a number of contributing factors, some of which are general to all experimental venues (length of the experiment, how interesting it is, how complicated in the instructions are) and some of which are specific to AMT (the more related HITs, the more attractive a target the experiment is to spammers).

All the more reason to always include catch trials.

-----------
Oppenheimer, D. M., Meyvis, T., & Davidenko, N. (2009). Instructional manipulation checks: Detecting satisficing to increase statistical power Journal of Experimental Social Psychology, 45, 867-872

Field of Science