Field of Science

In praise of experiments

Today, the excellent Neuroskeptic writes about a new study investigating which US states are most suicidal. The interesting twist was the form of the data: Google searches. It's an interesting study and an interesting use of Google searches, but what struck me was Neuroskeptic's closing thoughts.
Over the past couple of years there's been a flurry of studies based on analyzing Google and Twitter trends. What's interesting to me is that we're really in the early days of this, when you think about likely future technologies. What will happen when everyone's wearing a computer 24/7 that records their every word and move, and even what they see? 
Eventually, psychology and sociology might evolve (or degenerate) into no more than the analysis of such data...
It's always dangerous to predict the future, but here's my prediction: Not a chance. It gets down to a distinction between observational studies and experiments. Observational studies (where you record what happens in the course of normal events) are useful, particularly when you care about questions like what is the state of the world? They are much less useful when you want to know why is the world the way it is?

There are a couple reasons.

Reason #1: The correlation fallacy

First, observational studies are really about studying correlations. To have much power to analyze interesting correlations, you need a lot of data. This is what makes Google and Twitter powerful: they provide a lot of data. But correlation, famously, doesn't always tell you much about causation.

For instance, it is now well-known that you can use the number of pirates active in the world's oceans and seas to reasonably predict average global temperature (there's a strong correlation):

I did not know until recently that Google search data has now definitively shown a correlation between the amount of movie piracy and global warming as well:

In the case of real pirates vs. the temperature, the correlation runs the other way (temperature affects weather affects seafaring activities). I have no idea what causes the correlation between searches for free movies and searches about global warming; perhaps some third factor. To give another silly example, there is a lot more traffic on the roads during daylight than at night, but this isn't because cars are solar-powered!

The point is that experiments don't have this problem: you go out and manipulate the world to see what happens. Change the number of pirates and see if global temperatures change. Nobody has tried this (to my knowledge), but I'm willing to bet it won't work.

(Of course, there are natural experiments, which are a hybrid of observational studies and experiments: the experimenter doesn't manipulate the world herself but rather waits until somebody else, in the course of normal events, does it for her. Good examples are comparing different states as they adopt bicycle helmet laws at different times and comparing that against head injury stats in the various states. These are rarely as well-controlled as an actual experiment, but have the advantage of ecological validity.)

Reason #2: Life's too short

The second is that observational studies are limited by what actually happens in the world. You won't, from an observational study, find out what the effect of US politics is of every US senator taking up crack while every US representative takes up meth. (I hope not, anyway.)

That was an absurd example, but the problem is real. Language gives lots of great examples. Suppose you want to find out what sentences in any given language are grammatical and what sentences are not. You could do an observational study and see what sentences people say. Those are grammatical; sentences you haven't heard probably aren't.

The problem with this is that people are boring and repetitive. A small number of words (heck, a small number of sentence fragments) accounts for most of what people say and write. The vast majority of grammatical sentences will never appear in your observational sample no matter how long you wait, because there are actually an infinite number of grammatical English sentences. (In his impressive "Who's afraid of George Kingsley Zipf?", Charles Yang shows how a number of prominent language researchers went astray by paying too much attention to this kind of observational study.)

The basic feature of the problem is that for building theories -- explaining why things are the way they are -- very often what you care about are the border cases. Human behavior is largely repetitive, and the  border cases are quite rare. Experiments turn this around: by deliberately choosing the situations we put our participants in, we can focus on the informative test cases.

The experimental method: Here to stay

None of this should be taken as meaning that I don't think observational studies are useful. I conduct them myself. A prerequisite to asking the question Why are things the way they are is knowing, in fact, what way things are. There is also the question of ecological validity. When we conduct laboratory experiments, we construct artificial situations and then try to generalize the results to real life. It's good to know something about real life in order to inform those generalizations.

But just as I can't imagine observational studies disappearing, I can't imagine them replacing experimentation, either.


Neuroskeptic said...

Hey - thanks for the comments. I think you're right, experiments are here to stay for the foreseeable future, but just to play devil's advocate:

The problem with experiments is that they take place under controlled conditions (whether laboratory or field - the conditions are unrealistic) and this, arguably, means they are not to be trusted a priori.

Whereas if we use real live data on how people are behaving, we don't have that problem. And it might be possible to avoid the use of experiments entirely, if we had enough data (astronomers make do without them).

The reason you need experiments is that if you spot a correlation between X and Y you can't assume that X causes Y because it might be that there's some other factors Z that also differ...

However, in theory, if you could measure everything relevant to the case at hand, you could avoid that problem by just looking for Z... and while this is impossible today, it may not be a million miles off.

Like I say, I only half believe that, but food for thought.

GamesWithWords said...

@Neuroskeptic. Thanks for joining in. I hope the devil is paying you by the hour, not on a contingency fee. Because I think I'm still going to win this one:)

What makes an experiment an experiment is not unrealistic/controlled conditions, but that the experimenter intervened in the world. To give an example, at one point, web designers might have looked around at different websites to try to figure out what the best websites had that the unsuccessful ones did not (this is an observational study). Now, many websites randomly assign show different visitors slightly different pages in order to see what words better (this is an experiment). This is no more or less "realistic" than the observational study -- it's just a lot more powerful.

Post-election there has been a lot of discussion of the Democratic Party's adoption of experimental techniques. Republicans (esp. Rove) pioneered Big Data observational techniques (mine consumer data to figure out what correlates with being a Republican or Democratic voter). Democrats now do that AND conduct experiments: randomly assigning different markets to see different ads in order to see what does and doesn't work. (This was, if memory serves, actually pioneered by Rick Perry's campaign team, Republicans all, but in general has not been popular with Republicans.) This has been widely cited as a reason for Obama's successful re-election. Again, nothing "unrealistic" about these experiments that I see.

Another example: Some years back I blogged about a researcher who was interested in whether people like songs because they are popular or because they are good songs. So he set up a music downloading website (with Indie bands, I believe) and manipulated what information visitors were given. Sometimes he'd lie and say a song that had never been downloaded was downloaded lots, and vice versa. It turned out, he couldn't sell bad music: truly awful songs (as judged by a control group who weren't given download/likes information) weren't downloaded no matter how much he claimed other people had downloaded them. And really good songs were downloaded a lot whatever he said. But songs in the middle were heavily influenced by how "popular" they were claimed to be: the more "popular", the more downloads. Again, this study seems to be studying exactly the behavior we care about.

Those are experiments in the field, but lab experiments can be fine, too. It depends on what you are studying. If you are studying dating, probably introducing people in the lab and having them read from scripts isn't going to work well (yes, people have tried), though you could set up a speed dating event (also been done, works nicely). Other behaviors transfer to the lab quite nicely (like having people judge whether a sentence is grammatical or not).

In short: The fact that there are bad experiments doesn't make experiments bad. It's like saying that the problem with quarterbacks is that they can't throw a spiral. Tim Tebow can't throw a spiral. *Good* quarterbacks can.

As far as getting causation out of Big Data. Maybe you could in some cases (in my post, I gave examples where you can't, even in principle). The question is how efficient it would be. In some cases, it may be more efficient than an experiment, but I suspect not many.