Field of Science

Another bunch of retractions

It appears that a series of papers, written by a German business professor, are being retracted. This particular scandal doesn't seem to involve data fabrication, though. Instead, he is accused of double-publishing (publishing the same work in multiple journals) and also of making errors in his analyses (this lengthy article -- already linked to above -- discusses the issues in detail).

It's possible that I was not paying attention before, but there seems to be more publication scandals lately than I remember. When working on my paper about replication, I actually had to look pretty hard to find examples of retracted papers in psychology. That wouldn't be so difficult at the moment, after Hauser, Smeeters and Sanna.

If there is an increase, it's hopefully due not to an increase in fraud but an increase in vigilance, given the attention the issue has been getting lately.

Making up your data

Having finished reading the Simonsohn paper on detecting fraud, I have come to two conclusions:

1. Making up high-quality data is really hard. Part of the problem with making up data is that you have to introduce some randomness into it. If your study involves asking people how much they are willing to pay for a black t-shirt, you can't just write down that they all were willing to pay the average (say $12). You have to write down some variation ($12, $14, $7, $9, etc.).

The problem is that humans are notoriously bad at generating random number sequences. Simonsohn discusses this in terms of Tversky and Kahneman's famous, tongue-in-cheek paper "Belief in the law of small numbers." People think that random sequences should look roughly "average", even if the sample is small: Flip a coin 4 times, you should get 2 heads and 2 tails, when in fact getting 4 heads isn't all that improbable.

So your best bet, if you are making up data, is to use a computer program to generate it from your favorite distribution (the normal distribution would be a good choice in most cases). The problem is that data can have funny idiosyncrasies. One of the problems with the string of numbers I suggested above ($12, $14, $7, $9, etc.) is that humans like round numbers. So when people say what they are willing to pay for a t-shirt, what you should see is a lot of $10s, $20s and maybe some $5s and $15s. The numbers in my list are relatively unlikely.

The paper goes on to describe other problems as well. What I get from this is that making up data in a way that is undetectable is a lot of work, and you might as well actually run the study. So even leaving aside other reasons you might want to not commit fraud (ethics, desire for / belief in importance of knowledge, etc.), it seems sheer laziness alone should steer you the other direction.

2. The Dark Knight Rises is awesome. Seriously. Technically there was nothing about that in the paper, but I was thinking about the movie while reading the paper. Since I saw the show this morning, it's been hard to think of anything else. The most negative thing I can say about it is that it wasn't better than the last one, which is grading on a pretty steep curve.

Detecting fraud

Uri Simonsohn has now posted a working paper describing how he detected those two recent cases of data fraud. Should my other writing projects progress fast enough, I'll write about it soon. I'll also post links to any interesting discussions I come across.

A visual depiction of vision

Filed here, so I can use it next time I teach intro psychology:

What did we do before XKCD?

Update on Dragon Dictate

I recently bought a new computer, and Dragon Dictate is working much better on it, if not perfectly. And this is despite the fact that I have trained the new copy much less than the old one. One annoying/funny problem that keeps coming up: Dictate always transcribes "resubmission" as "recent mission". So, "Here's the news from the resubmission" becomes "Here's the news from the recent mission."