Field of Science

Another bunch of retractions

It appears that a series of papers, written by a German business professor, are being retracted. This particular scandal doesn't seem to involve data fabrication, though. Instead, he is accused of double-publishing (publishing the same work in multiple journals) and also of making errors in his analyses (this lengthy article -- already linked to above -- discusses the issues in detail).

It's possible that I was not paying attention before, but there seems to be more publication scandals lately than I remember. When working on my paper about replication, I actually had to look pretty hard to find examples of retracted papers in psychology. That wouldn't be so difficult at the moment, after Hauser, Smeeters and Sanna.

If there is an increase, it's hopefully due not to an increase in fraud but an increase in vigilance, given the attention the issue has been getting lately.

Making up your data

Having finished reading the Simonsohn paper on detecting fraud, I have come to two conclusions:

1. Making up high-quality data is really hard. Part of the problem with making up data is that you have to introduce some randomness into it. If your study involves asking people how much they are willing to pay for a black t-shirt, you can't just write down that they all were willing to pay the average (say $12). You have to write down some variation ($12, $14, $7, $9, etc.).

The problem is that humans are notoriously bad at generating random number sequences. Simonsohn discusses this in terms of Tversky and Kahneman's famous, tongue-in-cheek paper "Belief in the law of small numbers." People think that random sequences should look roughly "average", even if the sample is small: Flip a coin 4 times, you should get 2 heads and 2 tails, when in fact getting 4 heads isn't all that improbable.

So your best bet, if you are making up data, is to use a computer program to generate it from your favorite distribution (the normal distribution would be a good choice in most cases). The problem is that data can have funny idiosyncrasies. One of the problems with the string of numbers I suggested above ($12, $14, $7, $9, etc.) is that humans like round numbers. So when people say what they are willing to pay for a t-shirt, what you should see is a lot of $10s, $20s and maybe some $5s and $15s. The numbers in my list are relatively unlikely.

The paper goes on to describe other problems as well. What I get from this is that making up data in a way that is undetectable is a lot of work, and you might as well actually run the study. So even leaving aside other reasons you might want to not commit fraud (ethics, desire for / belief in importance of knowledge, etc.), it seems sheer laziness alone should steer you the other direction.

2. The Dark Knight Rises is awesome. Seriously. Technically there was nothing about that in the paper, but I was thinking about the movie while reading the paper. Since I saw the show this morning, it's been hard to think of anything else. The most negative thing I can say about it is that it wasn't better than the last one, which is grading on a pretty steep curve.

Detecting fraud

Uri Simonsohn has now posted a working paper describing how he detected those two recent cases of data fraud. Should my other writing projects progress fast enough, I'll write about it soon. I'll also post links to any interesting discussions I come across.

A visual depiction of vision

Filed here, so I can use it next time I teach intro psychology:



What did we do before XKCD?

Update on Dragon Dictate


I recently bought a new computer, and Dragon Dictate is working much better on it, if not perfectly. And this is despite the fact that I have trained the new copy much less than the old one. One annoying/funny problem that keeps coming up: Dictate always transcribes "resubmission" as "recent mission". So, "Here's the news from the resubmission" becomes "Here's the news from the recent mission."

Since one can't be snarky in a response to a review...

I'll do it here. I am currently revising a paper for resubmission. On the whole, the reviews are fairly reasonable, with the exception of one cranky comment from a reviewer who complains that our literature review is woefully incomplete. This incompleteness seems to be our failure to cite one particular study. The reviewer writes
It is possible that this work is flawed, but it really should be discussed.
It does seem to be a relevant study and we would have cited it, had we known about it. Why didn't we know about it? Because it has never been published. It hasn't even been presented at any of the normal psycholinguistics conferences (though it has appeared at some linguistics conferences). Short of emailing every researcher who might be conducting a study that might be relevant, I'm not sure what this reviewer was expecting of us.

I'd also love to know what the folks who are obsessed with only citing studies published in peer-reviewed journals would say. (It's possible that some of these conferences it has been presented at have pretty thorough review procedures; I wouldn't know.)

The Psychologist on Replication

The Psychologist solicited opinions on the importance of replication from a number of researchers, including yours truly. See a preview here.

Eadweard J. Muybridge & Google Doodle

Today's Google Doodle is a fantastic tribute to Muybridge. I haven't found a permalink, but people looking after today can find it archived in a fashion on youtube.

Point-light walkers

By far the best point-light walker demonstration I've seen is at biomotiolab.ca. I'm classifying this as an illusion (see post label) because, of course, point-light walkers aren't really walking people -- they are just a few white dots moving around the screen. Comparing the male and female versions is particularly fun if you've ever wondered what exactly it is that makes for a stereotypical male or female stride.

It also appears that there is an experiment you can participate in if you want to help with this kind of research.


Fair Use & FedEx


And now for something completely different:One private citizen's trials and travails trying to convince FedEx to print posters.

I have wanted a map of Hong Kong on my wall for some time. The Survey & Mapping office of the Hong Kong government helpfully provides some free maps for public use on their website. You will notice how the website helpfully includes a "free maps"logo, along with a copyright notice forbidding only commercial use of the map. Presumably they thought this was a good way of providing some publicity for the Special Administrative District.

They did not take into account FedEx Office. I put this map on a USB stick and went to the FedEx Office at Government Center to have it printed. The manager there refused to print it as I didn't have proof of copyright ownership. I showed him the website (particularly where it says "free maps"). He said the fact that the map is free for public use was irrelevant; he needed a signed document from the copyright owner (the government of Hong Kong) stating that I, personally, had the right to print off the map.

His explanation for his refusal was simple: "I can't get between me and the copyright holder." I pointed out that he was getting between me -- who wants to print the map -- and the copyright owner -- who also wants me to print the map. He repeated that even so, he "can't get between me and the copyright holder." This was just repetition, so I pointed out again that the map is clearly labeled for public use. He said that was just "he said/she said" business; what he needed was a signed document.

I'm curious what he would do with a signed document in Chinese, and whether he would require a notarized translation. I realized as I was leaving that at the beginning when the manager was trying to establish whether I had the right to print the map, he had asked me if I was a member of the organization that made the map -- that is the Hong Kong government. I'm curious what would have happened if I had said yes.

The "copyright waiver"


This is not the 1st time that I've had a run-in with the copyright police at FedEx. Last year the Palo Alto FedEx refused to print a poster that I was supposed to present at a conference at Stanford. I study story comprehension in small children, and a common practice is to use stories about familiar characters. In this case, I had stories about Dora the Explorer and a few other cartoon characters. Because my poster showed an example of one of the pictures that we had drawn to go with the stories, FedEx initially refused to print the poster, saying that it violated copyright.

After a long discussion about fair use and noncommercial uses, one of the employees remembered that they have a “copyright release” form that they can use in these circumstances. Unfortunately, they couldn't find any blank copies. One enterprising employee simply wrote the words “copyright release” on a piece of paper and asked me to sign that piece of paper.

I wasn't sure about the wisdom of signing and essentially blank piece of paper (you can see a photo of it on the right), so they came up with another plan, which was to whiteout all the writing on a previously filled out form, which they then copied (not waiting for the whiteout to dry and getting white out all over their copier in the process) and which I signed. Then they printed my poster and I went on to have an otherwise successful conference.

Copyright and FedEx

Clearly somebody has instilled the Fear of the Lord into the  employees at FedEx with regards to copyright infringement. FedEx is understandably concerned about their liability, since unlike me, they have actual assets. I also realize that FedEx may not have the resources to have somebody on staff who has been adequately trained to deal with copyright issues ... but in that case, it suggests that maybe they do not have the resources to run a print shop. After all, it is not like they are not making determinations now. They are just doing it randomly and incorrectly.

Are you a Red Sox or Yankees fan?

If so, a colleague has a short survey for you. It seems she is trying to get together as much data as possible for a talk next week. Apparently there is also an opportunity to win a $50 gift card, though my motivation for participating was in order to help out with some interesting research.

Zeno

Many people are familiar with Zeno's paradox, though probably not in the form presented by XKCD:

(If you aren't familiar with it or need a refresher, just follow the link above.)

Perhaps this is widely known, but I only recently discovered what the point of Zeno's paradox was: he was trying to prove that motion is impossible. Nothing ever moves and nothing ever changes.

This probably sounds absurd, but it was the basis of a philosophical school of which Zeno was part. Zeno created a number of paradoxes, all of which were meant to demonstrate that if the idea that nothing ever moves or changes is absurd, well then it is no more absurd than the idea that things do move and do change. If motion was possible, you would end up, for instance, with Zeno's never-ending race.

This is just another demonstration that many famous philosophical ideas are often remembered now for reasons very different from the reason for which they were first put forth.

(Insight gleaned from Anthony Gottlieb's excellent The Dream of Reason).

Color illusion -- too cool to believe

By far the most striking visual illusion I've ever seen. A little bit of color after-effect turns a black-and-white photograph into a vivid color photograph. You may have to do it a few times to convince yourself it is real.

Results: Replication in Psychology


My paper with Adena Schachner on replication in psychology is now published. The paper contains 3 main sections: a reasonably thorough literature review on replication rates in psychology, a proposal as to how to improve replication rates (primarily through tracking replication rates), and the results of a survey of psychologists on replication practices (many thanks to all who participated). The results of the survey was that while not nearly enough replications are attempted, there are actually more being attempted than we had guessed (or than many of our colleagues that we discussed this project with had guessed).
This paper is part of a larger collection of papers on reimagining the publication and review process, and is more of those papers are printed, I plan to discuss at least some of them.

Pilot data

I am back from a long semi-silence.I have been trying to finish up a number of projects, which gives me less time to write. Speaking of…

One of the focuses of my work is figuring out how children learn the meaning of verbs. This is made more complicated by the fact that we don't actually have completely solid and uncontroversial definitions of verbs. If we don't know what verbs mean, how can we tell when a child has successfully learned them?

I am working on a large scale project to get better definitions of verbs. We are developing many different tasks, each of which gets at one specific aspect of meaning that is thought to be important for at least some verbs. The traditional method would be to have skilled linguists go through verbs one at a time and consult their own intuitions, and in fact a lot of very good work has been done this way (e.g., Jackendoff's Semantic Structures, among many others). However, there are certain advantages to having this work done by a larger number of people who are naïve to linguistic theory, not the least of which is that there are a very large number of verbs, and one person can't get through them all in any reasonable speed. The one disadvantage of working with naïve participants is that they do not understand linguistic 
terminology, so you have to find some other way to explain the task.

I have been developing some such tasks, and I could really use some pilot data to see how well they are working. If you have a little time to spare, I would really appreciate the help. There are 3 in particular I am currently working on:


There is a comments box at the end where you can leave any feedback and mention anything you noticed or which you found confusing. I do need data on all three, so please don't everyone just do the first one. 

Fair warning: These tasks take a bit longer than the ones on my website. My guess is that they will take 20-30 minutes each, but that is a wild guess. If somebody does one and wants to leave a comment about how long it took, that would be helpful for me and also for others who might want to do it.

Many thanks.