Field of Science

Web Experiment Tutorial: Chapter 2, Methodological Considerations

Several years ago, I wrote a tutorial for my previous lab on how to create Web-based experiments in Flash. Over the next number of months, I'll be posting that tutorial chapter by chapter.

1. Why do an experiment online?

Whenever designing an experiment, it is important to consider your methods. Explicit measures or implicit? Is self-report sufficient? Etc. Ideally, there is a reason for every aspect of your methods.

So why put an experiment on the Web? This chapter discusses the pros and cons of Web-based research, as well as when to use it and when to skip it.

Note: I've added an additional section to the end of this chapter to discuss paid systems like Amazon Mechanical Turk.

2. Cost.

Experiments cost both time and money – yours as well as your participants’. Here, Web-based experiments have an obvious edge. Participants are not paid, and you don’t have to spend time scheduling or testing them (however, recruiting them may be extremely time-consuming, especially in the beginning).

3. Reliability of the subjects

A common complaint about Web-based experiments is that “who knows what your participants are actually doing?”

It is true that people lie on the Internet. There is a great deal of unreliable information. This worry probably stems from this phenomenon.

However, subjects who come to the lab also lie. Some don’t pay attention. Others don’t follow the directions. Some just generally give bad data. The question becomes whether this phenomenon is more likely on the Web or in the lab.

There are several reasons to favor the Web. Lab-based subjects are generally coerced by either cash or course credit, while Web-based participants are typically volunteers (although some labs do use lotteries to induce participants). If a lab-based subject gets bored, s/he is nonetheless stuck with finishing the experiment. Although they have the right to quit, this is quite rare. Web-based subjects can and do quit at any time, thus removing themselves and their boredom-influenced data from the analyses.

In addition, good experimental design contains built-in checks to ensure that the subjects are not just randomly pressing buttons. That’s out of the scope of this manual.

Finally, you might be concerned that the same subject is participating multiple times. If you have large numbers of subjects, this is probably just noise. However, there are several ways to check. You can record IP addresses. If the participants have DHCP (dynamically-assigned IP addresses), this is imperfect, because the last several digits of the IP address can change every few minutes. Also, different people may use the same computer. Nonetheless, IP address inspection can give you an idea as to whether this is a pervasive problem. If you have very large numbers of subjects, it’s probably not an issue.

You can also require subjects to get usernames and passwords, though this adds a great deal of complexity to your programming and will likely turn away many people. Also, some people (like me) frequently forget their passwords and just create new accounts every time I go to a website.

Another option is to collect initials and birthdates. Two people are unlikely to share the same initials and birthdays. Here, though, there is a particularly high risk that subjects will lie. You can decrease this by asking only for day of the month, for instance.

Another worry are bots. Bots are programs that scour the Web. Some are designed to fill out surveys. If you are using HTML forms to put together a survey, this is a definite risk. You should include some way of authenticating that the participant is in fact a human. The most typical approach is generating an image of letters and numbers that the participant must type back in. To sign up for a free email address, this is always required.

To my knowledge, bots do not interface well with the types of Flash applications described in this book. I have not run across any evidence that bots are doing my experiments. But the day may come, so this is something to consider. The only real worry is that a single bot will give you large amounts of spurious data, masquerading as many different participants. Many of the safeguards described above to protect against uncooperative subjects will also help you with this potential problem.

Several studies have actually compared Web-based and lab-based data, and Web-based data typically produces results of equivalent or even better quality.

4. Reliability of the data

Here, lab-based experiments have a clear edge. Online, you cannot control the size of the stimulus display. You cannot control the display timing. You cannot control the distance the subject sits from the screen. You cannot control the ambient sound or lighting. If you need these things to be controlled, you should do the experiment in the lab.

Similarly, reaction times are not likely to be very precise. If your effect is large enough, you can probably get good results. But a 5 millisecond effect may be challenging.

That said, it may be worth trying anyway. Unless you think that participants’ environments or computer displays will systematically bias their data, it’s just additional noise. The question is whether your effect will be completely masked by this noise or not.

The one way in which the Web has an advantage here is that you may be able to avoid fatigue (by making your experiment very short) or order effects. By “order effect,” I mean that processing of one stimulus may affect the processing a subsequent stimuli. Online, you can give some subjects one stimulus and other subjects the other stimulus, and simply compensate by recruiting larger numbers of subjects. Another example includes experiments that require surprise trials (e.g., inattentional blindness studies). You can only surprise the same subject so many times.

5. Numbers of subjects

Here, Web-based experiments are the run-away winner. If you wanted to test 25,000 people in the lab, that would be essentially impossible. Some Web-based labs have successfully tested that number online.

6. Length of the experiment

If you have a long experiment, do it in the lab. The good Samaritans online are simply not going to spend 2 hours on your experiment out of the goodness of their hearts. Unless you make it really interesting to do!

If you can make your Web-experiment less than 2 minutes long, do it. 5 minutes is a good number to shoot for. At 15 minutes, it becomes very difficult (though still possible) to recruit subjects. The longer the experiment, the fewer participants you will get. This does interact with how interesting the experiment is: frequently has experiments that run 15 minutes or more, and they still get many, many participants.

That said, if you have a long experiment, consider why it is so long. Probably, you could shorten it. In the lab, you may be used to having the same subject respond to the same stimulus 50 times. But what about having 50 subjects respond only once?

Suppose you have 200 stimuli that you need rated by subjects. Consider giving each subject only 20 to rate, and get 10x as many subjects.

The only time you may run into difficulty is if you need to insert a long pause into the middle of the experiment. For instance, if you are doing a long-term memory experiment and you need to retest them after a 1 hour delay. This may be difficult. I have run several experiments with short delays (2-10 minutes). I fill the delay with some of the Web’s most popular videos in the hopes that this will keep the subjects around. Even so, the experiment with the 10 minute delay attracts very few subjects. So this is something to consider.

7. Individual differences

Generally, your pool of subjects online will be far more diverse than those you can bring into the lab. This in and of itself may be a reason for favoring the method, but it also brings certain advantages.

Laughably large numbers of subjects allow you to test exploratory questions. Are you curious how age affects performance on the task? Ask subjects’ their ages. It adds only a few seconds to the experiment, but may give you fascinating data. I added this to a study of visual short-term memory and was able to generate a fascinating plot of VSTM capacity against age.

Similarly, Hauser and colleagues compared moral intuitions of people from different countries and religious backgrounds. This may not have been the original purpose of the experiment, but it was by far the most interesting result, and all it required was adding a single question to the experiment.

You could also compare native and non-native English speakers, for instance, just to see if it matters.

Online, you may have an easier time recruiting difficult-to-find subjects. One researcher I know wanted to survey parents of children with a very rare disorder. There simply aren’t many in his community, but he was able to find many via the Internet. Maybe you want to study people with minimal exposure to English. You are unlikely to find them on your own campus, but there are literally millions online.

8. I have an experiment. Which should I pick?

This is of course up to you. Here are the guidelines I use:

Pick lab-based if:
The experiment must be long
The experiment requires careful controls of the stimuli and the environment

Pick Web-based:
The experiment requires very large numbers of subjects
You don’t have many stimuli and don’t want to repeat them
The experiment is very short
You want to avoid order effects
You want to look at individual differences
You want to study a rare population
You want to save money

Note that for most experiments, you could go either way.

9. Amazon Mechanical Turk

Since I originally wrote this tutorial, a number of people have started using Amazon Mechanical Turk. Turk was designed for all kinds of outsourcing. You have a project? Post it on the Turk site, say how much you are willing to pay, and someone will do it. It didn't take long for researchers to realize this was a good way of getting data (I don't know who thought of this first, but I heard about it from the Gibson lab at MIT, who seem to be the pioneers in Cambridge, at least).

Turk has a few advantages over lab-based experiments: it's a lot cheaper (people typically pay at around $2/hour) and it's fast (you can run a whole experiment in an afternoon).

Comparing it with running your own website like I do is tricky. First, since participants are paid, it obviously costs more than running a website. So if you do experiments involving very, very large numbers of participants, it may still be too expensive. On the flip side, recruitment is much easier. I have spent hundreds of hours over the last few years recruiting people to my website.

Second, your subject pool is more restricted, as participants must have an Amazon Payments account. There may be some restrictions on who can get an account, and in any case, people who might otherwise do your experiment may not feel like making an account.

One interesting feature of Turk is that you can refuse to pay people who give you bad data (for instance, get all the catch trials wrong). Participants know this and thus may be motivated to give you better data.

Currently, I am using Turk for norming studies. Norming studies typically require a small, set number of participants, making them ideal for Turk. Also, they are boring, making it harder to recruit volunteers. I do not expect to switch over to using Turk exclusively, as I think the  method described in this tutorial still has some advantages.

Amazon has a pretty decent tutorial on their site for how to do surveys, so I won't cover that here. More complex experiments involving animations, contingent responses, etc., should in theory be possible, but I don't know anybody doing such work at this time.


Anonymous said...

Thanks for sharing your tutorial with us. I've always wanted to try making online experiments. If I may ask, how steep is the learning curve?

I'm looking forward to the rest of the chapters.

GamesWithWords said...

It depends a lot on how much experience you have with programming. If you are familiar with one programming language, learning another isn't too bad.

I should add that, having now started using Mechanical Turk a bit, if you're doing relatively simple things like surveys, Turk offers a very simple way of setting them up (very quickly you run into limitations, though, and doing anything even mildly complicated requires serious programming).