Field of Science

Survey on Replication

Are you a researcher working in psychology or related domains (neuroscience, linguistics, etc.)? A colleague and I are conducting a survey on replication in these fields, for inclusion in an upcoming special issue of Frontiers in Computational Neuroscience. You can fill out the survey here.

Photo credit here.

The pace of review

One of my manuscripts will shortly enter its 7th month of being under review. Apparently one of the three reviewers keeps promising to send in a review but never does. Now the 4+ months a different manuscript languished under review seems speedy.

Ray Kurzweil is convinced that the pace of science is increasing exponentially and will continue to do so. I think he's neglected one rate-limiting step in the process.

What the Best College Teachers Do: A Review of a Vexing Book

What the Best College Teachers Do is not a bad book. It is engaging and reasonably well-written. The topic is both evergreen and timely, and certainly of interest to college teachers at the very least (as well as to people who rate college quality and to people who use those ratings to decide where to go to school). My issue with this book is that it is incapable of answering the question it sets out for itself.

A problem of comparison

The book is based primarily on extensive research by the author, Ken Bain, and his colleagues. The appendix spells out in detail how they identified good college teachers (a combination of student evaluations, examples of student work, department examinations, etc.) and how they collected information about those gifted individuals (interviews, taped class sessions, course materials, etc.). They analyzed these data to determine what these best college teachers did.

Even assuming that (a) their methods successfully identified superior teachers, and (b) they collected the right information about those teachers' practices, this is only half of a study. Without even looking at their data, I can easily rattle off some things all these teachers had in common:

1. They were all human beings.
2. They were all taller than 17 inches.
3. They all spoke English, at least to some degree (the study was conducted in the USA).
4. Most were either male or female.

Commonalities are not limited to attributes of the teachers, but also to what they do in the classroom:

5. Most showed up to at least half of the class periods for a given course.
6. None of them habitually sat, silent and unmoving, at the front of the classroom for the duration of class.
7. They did not assign arbitrary grades to their students (e.g., by rolling dice).
8. Very few spoke entirely in blank verse.

While these statements are almost certainly true of good college teachers, they do not distinguish the good teachers from the bad ones. Since Bain and colleagues did not include a comparison group of bad teachers, we cannot know if their findings distinguish the good teachers from the bad ones.

Science -- like teaching -- requires training

A good test of teaching ability should pick out all the good teachers. It should also pick out only the good teachers. (A somewhat different cut of the issues is to consider test reliability and test validity). What the Best College Teachers Do focuses entirely on the first issue. As my reductio ad absurdum above shows, having only half of a good test is not a test that is 50% right; it's a useless test.

It's unfortunate that Bain and his colleagues failed in this basic and fundamental aspect of scientific inquiry. Although Bain is now the director of the Center for Teaching Excellence at New York University, he was trained as a historian. This comes out in the discussion of the study methods: "Like any good historians who might employ oral history research techniques, we subsequently sought corroborating evidence, usually in the form of something on paper..." (p. 187).

I would hope that any good historian doing comparative work would know to include a comparison group, but designing a scientific study of human behavior is hard. Even psychologists screw it up. And that's the focus of our training, whereas historians are mostly learning things other than experimental design (I assume).

Circular Definitions

Of course, failing to include a control group is not the only way to ruin a study.You can also make it circular.

Chapter 3 focuses on how excellent teachers prepare for their courses:
At the core of most professors' ideas about teaching is a focus on what the teacher does rather than on what the students are supposed to learn. In that standard conception, teaching is something that instructors do to students, usually by delivering truths about the discipline. It is what some writers call a 'transmission model.' ... 
In contrast, the best educators thought of teaching as anything they might do to help and encourage students to learn. Teaching is engaging students, engineering an environment in which they learn.
Here is what the appendix says about how the teachers were chosen for inclusion in the study:
All candidates entered the study on probation until we had sufficient evidence that their approaches fostered remarkable learning. Ultimately, the judgment to include someone in the study was based on careful consideration of his or her learning objectives, success in helping students achieve those objectives, and ability to stimulate students to have highly positive attitudes toward their studies.
It seems that perhaps teachers were included as being "excellent teachers" if they focused on student learning and on motivating students. The researchers then "found" that excellent teachers focus on student learning and on motivating students.

Vagueness and Ambiguity

Or maybe not. I'm still not entirely sure what it means to -- in the first quote -- focus on "what the teacher does" than on "what the students are supposed to learn." For instance, Bain poses the following thought problem on page 52:

"How will I help students who have difficulty understanding the questions and using  evidence and reason to answer them."

Is that focusing on what the teacher does or focusing on what the students are supposed to learn? How can we tell? By what metric?

My confusion here may merely mark me was one of those people expecting "a simple list of do's and don'ts" who are "greatly disappointed." Bain adds (p. 15), "The ideas here require careful and sophisticated thinking, deep professional learning, and often fundamental conceptual shifts." That's fine. But if there is no metric I can use to find out whether I'm following these best practices or not, what good does this book do me?

(Also, without knowing what exactly Bain means by these vague statements, there is no way to ensure that his study wasn't circular, as described in the previous section. I gave only one example, but the general problem is clear: Bain defined great teachers by one set of criteria and then analyzed their behavior in order to extract a second set of criteria. If both sets of criteria are loosely and vaguely defined, there's no way even in principle to know whether he isn't just measuring the same thing both times.)

Credible Reviews

So if we don't trust Bain's study, is there anything else in the book worth reading? Maybe. What the Best College Teachers Do is not myopically focused on Bain's own research. He reviews the literature, citing the conclusions from other studies of teaching quality, broadening the scope of the framework outlined in the book. However, this raises its own problem.

In writing a review, the reviewer is supposed to survey the literature, find all the relevant research, determine what the best research is, and then synthesize everything into a coherent whole (or at least, into something as coherent as the current state of the field allows). The reviewer generally does not describe the studies in sufficient detail to allow the reader to evaluate them directly; only a brief overview is provided, with a focus on the conclusions.

If you trust the reviewer, this is fine. That's why reviews from the most respected researchers in the field are typically highly valued, so much so that publishers and editors often solicit reviews from these researchers. Obviously, a review of the latest research on underwater basket weaving by a fifth-grader would not be so highly prized, because (a) we don't believe the fifth-grader did a particularly thorough review, and (b) we don't trust the fifth-grader's ability to sort the wheat from the chaff -- that is, identify which studies are flawed and which are to be believed.

Bain is clearly very smart. He has clearly read a lot. But I do not trust his ability to read scientific literature critically. The only evidence I have of his abilities is in the design of his own study, which is deeply flawed, as described above. If he can't design a study, why should I trust his analysis of other people's studies?

Building a better mousetrap

Criticizing a study is easy, but it's not much of a critique if you can't identify what a better study would look like. Clearly from my discussion above, I would want (a) clear criteria for defining good teaching, (b) clearly-defined measures of teacher behavior, and (c) a group of good teachers and a group of bad teachers for comparison, and probably a group of average teachers as well (otherwise, any differences between good and bad teachers could be driven by bad habits of the bad teachers rather than good habits of the good teachers).

After a set of behaviors that are typical of good teachers -- and which are less frequent or absent in average or bad teachers -- has been identified, one would then identify a new group of good, average, and bad teachers and replicate the results. (The risk is otherwise is one of over-fitting the data: the differences you found between good teachers and the rest were just the result of random chance. This actually happens quite a lot more than many people realize.)

At the end of this process, we should have a set of behaviors that really are particular to the best teachers, assuming that the criteria we used to define good teachers are valid (not an assumption to be taken lightly).

Becoming a good teacher

Whether or not this information would be of any use to those aspiring to be good teachers is unclear. To find out that, we'd actually need to do a controlled study, assigning one set of teachers to emulate this behavior and another set to emulate behavior typical of average teachers. Ideally, we'd find that the first group ended up teaching better. I'm unsure whether that's particularly likely to happen, for a number of reasons.

First, consider Bain's summary of the habits of the best teachers (summarizing, with some direct quotations, from pps. 15-20):

1. Outstanding teachers know their subjects extremely well.
2. Exceptional teachers treat their lectures, discussion sections, problem-based sessions, and other elements of teaching as serious intellectual endeavors as intellectually demanding and important as their research and scholarship.
3. They avoid objectives that are arbitrarily tied to the course and favor those that embody the kind of thinking and acting expected for life.
4. The best teachers try to create an environment in which people learn by confronting intriguing, beautiful, or important problems, authentic tasks that will challenge them to grapple with ideas, rethink their assumptions, and examine their mental models of reality.
5. Highly effective teachers tend to reflect a strong trust in students.
6. They have some systematic program to assess their own efforts and to make appropriate changes.

Much of this list looks like a combination of intelligence and discipline. That is clearly true for #1, and probably true for #2 and #3. To the extent that #4 is hard to do, it probably takes intelligence. And #6 is just a good idea, more likely to occur to smart people and only pulled off by disciplined people. I'm not sure what #5 really means.

If the key to being a good teacher is to be smart and disciplined, this news will be of little help to teachers who are neither (though it may be helpful to people who are trying to select good teachers). In other words, even if we determine what makes a good teacher, than doesn't mean we can make good teachers.

The best teachers

Of course, even if the strategies that good teachers use are ones you can use yourself, that doesn't mean you can use them correctly.

There is an old parable about two young women. One was exceptionally beautiful. She used to sit at her window and gaze out over the field, looking forlorn and sighing with melancholy. Villagers passing by would stop and stare, struck by her heavenly beauty. One such villager was another young woman, who was the opposite of beautiful. Nonetheless, on seeing this example, she went home, sat at her own window, gazed out over the field and sighed. Someone walked by, saw her, and promptly vomited.

Objectification of female beauty and strange fetishization of melancholy aside, the point of this parable is that just because something works for someone else doesn't mean it'll work for you. When I think about the very best teachers I've known, one thing that stands out is how idiosyncratic their methods and abilities have been. One is a high-energy lecturer who runs and jumps during his lectures (yes, math lectures), who is somehow able to turn linear algebra into a discussion class. Another, in contrast, faded into the background. He rarely lectured, preferring to have students work (in groups or individually) on carefully-crafted questions. A third is a gifted lecturer and the master of the anecdote. While others use funny anecdotes merely to keep a lecture lively, when he uses an anecdote, it is because it illustrates the point at hand better than anything else. Over at the law school, there are a number of revered professors famous for their tendency to humiliate students. This humiliation serves a purpose: to show the students how much they have to learn. The students, rather than being alienated, strive to win their professors' approval.

These methods work for each, but I can't imagine them swapping styles round-robin. Their teaching styles are outgrowths of their personalities. Many are high-risk strategies, which if they fail, fail disastrously (don't humiliate your students unless you have the right kind of charisma first).

Are there strategies that will work for everyone? Is there a way of determining which strategies will work for you, with your unique set of strengths and weaknesses? I'd love to find out. But it won't be by reading What the Best College Teachers Do.

The missing linking hypothesis

Science just published a paper on language evolution to much fanfare. The paper, by Quentin Atkinson, presents analysis suggesting that language was "invented" just one time in Africa. That language first appeared in Africa would be of little surprise, since that's where we evolved. That there was only one point at which it evolved is somewhat more controversial, and also trivially false if one includes sign languages, at least some of which have appeared de novo in modern times (and one could make a case for including spoken creoles in the list of de novo languages).

What still boggles my mind is the analysis that supports these conclusions. In many ways, it seems brilliant -- but I can't escape the feeling that there is something amiss with the argument. The problem, as we'll see, is a series of missing linking hypotheses.

The Data

The primary finding is that the further you go from Africa (very roughly following plausible migration paths), the fewer phonemes the local language has. Hawai'ian -- the language spoken farthest from our African point of origin -- has only 13 phonemes. Some languages in Africa have more than 100.

To support the claim that this demonstrates that language evolved in Africa, one must add some additional data and hypotheses. One datum is that languages spoken by more people have more phonemes. Atkinson argues that whenever a new population migrated away from the parent population, it would necessarily be a smaller group ... and thus their language would have fewer phonemes than the parent group. Keep this up and over time, you end up with just a few phonemes left.

Population genetics

This argument seems to derive a lot of its plausibility from well-known phenomena in population genetics. Whenever a new population branches off (migrates away), it will almost by definition have less genetic diversity than the mother population. And in fact Africa has greater genetic diversity than other continents.

Atkinson tries to apply the same reasoning to phonemes:
Where individuals copy phoneme distinctions made by the most proficient speakers (with some loss), small population size will reduce phoneme diversity. De Boer models the evolution of vowel inventories using a different approach, in which individuals copy any members of their group with some error, and finds the same population size effect.
I see the logic, but then phonemes aren't genes. When ten people leave home to start a new village, they can only take ten sets of genes with them, and even some of that diversity may be lost because of vagaries of reproduction. Those alleles, once gone, are not easily reconstructed.

As far as I can tell, to apply the same logic to phonemes we have to assume a fair percentage of children fail to learn all the phonemic contrasts in their native language. For some reason, this does not prevent them from communicating successfully. In a large population, the fact that many people lack this or that phonemic contrast doesn't matter, as on average, most people know any given phonemic contrast, and thus it is transmitted across the generations. When a small group leaves home, however, it's quite possible that by accident there will be a phonemic contrast that few (or none) of them use. The next generation is then unlikely to use that contrast.

This may be true, but I don't find its plausibility so overwhelming that I'm willing to accept it on face value. I'd actually like to see data showing that many or most speakers of a given language do not use all the phonemic contrasts (beyond the fact that of course some dialects are missing certain phonemes, as in the fact that Californians do not distinguish between cot and caught; dialectical variation probably cannot support Atkinson's argument, but I leave the proof to the reader ... or to the comment section).

Phonemes and Population Size

Atkinson reports being inspired by the relatively recent finding that languages spoken by more people have more phonemes. Interestingly, the authors of that paper note that "we do not have well-developed theoretical arguments to offer about why this should be." It seems to me that Atkinson's analyses depend crucially on the answer to this puzzle, though as I mentioned at the outset, I haven't been able to quite work out all the details yet.

Atkinson's analysis crucially depends on (among things) the following supposition: the current population size of any language community is roughly predicted by the number of branching points (migrations) since the original language (which arose somewhere on the order of 50,000 and 100,000 years ago). I'm still on the fence as to whether or not this is a preposterous claim or very reasonable.

It is certainly very easy to construct scenarios on which this supposition would be false. Civilizations expand and contract rapidly (consider that English was confined to only one part of Great Britain half a millennium ago, or that Celtic languages were spoken across Europe only 2,000 years ago). Relative population size today seems to be driven more by poverty, access to birth control and education, etc., than anything else. Atkinson only needs there to be a mid-sized correlation, but 50,000 years is a very, very long time.

Atkinson also needs it to be the case that the further from Africa a language is spoken, the more branching points there have been. The problem we have is that there is a lot of migration within already-settled areas (Indo-European expansion, Mandarin expansion, Bantu expansion, etc.). So we need it to be the case that most of the branching of language groups happened going into new, unsettled areas, and relatively little of it is a result of invading already-populated areas. That may be true, but consider that all of Africa, Europe, Asia and the Americas were settled by 10,000 years ago, which leaves a lot of time for language communities to move around.


Atkinson put together a very interesting dataset that needs to be explained. His explanation may well be the right one. However, his explanation requires making a number of conjectures for which he offers little support. They may all be true, but this is a dangerous way to make theories. It's a little like playing Six Degrees to Kevin Bacon where you are allowed to conjecture the existence of movies and co-stars. It should be obvious that with those rules, you can connect Kevin Bacon to anyone, including yourself. 


The super-lame New Yorker review of the recent Broadway revival of Stoppard's "Arcadia" moved me to do a rare thing: write a letter to the editor. They didn't publish it, despite the fact -- and I think I'm being objective here -- my letter was considerably better than the review. Reviews are no longer free on the New Yorker website (you can see a synopsis here), but I think my letter covers the main points. Here it is below:

Hilton Als ("Brainstorm", Mar 28) writes about the recent revival of "Arcadia" that Stoppard's "aim is not to show us people but to talk about ideas." Elsewhere, Als calls the show unmoving and writes that Stoppard does better with tragicomedies.
"Arcadia" is not a show about ideas. It is about the relationship people have with ideas, particularly their discovery. Anyone who has spent any amount of time around academics would instantly recognize the characters as people, lovingly and realistically depicted. (Als singles out Billy Crudup's "amped-up characterization of the British historian Bernard Nightengale" as particularly mysterious. As Ben Brantley wrote in the New York Times review, "If you've spent any time on a college campus of late, you've met this [man].")
As an academic, the production was for me a mirror on my own life and the people around me. Not everyone will have that experience. The beauty of theater (and literature) is that it gives us peek into the inner lives of folk very different from ourselves. It is a shame Als was unable to take advantage of this opportunity.
Where the play focuses most closely on ideas is the theme of an idea (Thomasina's) stillborn before its time. If one feels no pathos for an idea that came too soon, translate "idea" into "art" and "scientist" into "artist" and consider the tragedies of artists unappreciated in their time and quickly forgotten. Even a theater critic can find the tragedy in that.

Graduate School Rankings

There have been a number of interesting posts in the last few days about getting tenure (1, 2, 3). One thing that popped out at me was the use of the National Research Council graduate school rankings in this post. I am surprised that these continue to be cited, due to the deep flaws in the numbers. Notice I said "numbers", not "methodology". I actually kind of link their methodology. Unfortunately, the raw numbers that they use to determine rankings are so error-ridden as to make the rankings useless.

For those who didn't see my original posts on the subject, cataloging the errors, see here and here.

Annoyed about taxes

It's not that I mind paying taxes per se. In fact, I consider it everyone's patriotic duty to pay taxes. I just wish it wasn't so damn complicated.

The primary confusion I have to deal with every year is that Harvard provides a series of mini-grants for graduate students, which they issue as scholarships. Scholarships are taxable as income, unless they are used to pay for tuition or required course supplies are not taxable, however. Scholarships which are used to I'm a graduate student, which means that the four courses I take every semester are "independent research," and obviously doing research is required. On the other hand, the IRS regulations specifically state that any scholarships used to pay for research are taxable. So if I use the mini-grant to pay for my research, is it taxable or not?

I actually asked an IRS representative a few years ago, and she replied that something counts are "required for coursework" only if everyone else taking that course has to buy it. If "everyone else" includes everyone else in the department doing independent research, then it's trivially the case that they are not required to do my research (though that would be really nice!), nor are they actually required to spend anything at all (some people's research costs more than others). If "everyone else" is only me -- this is independent research after all -- then the mini-grant is not taxable. This of course all hinges on whether or not "independent research" is actually a class. My understanding is that the federal government periodically brings action against Harvard, claiming that independent research is not a class.

Some people occasionally deduct the mini-grant expenditures as business expenses. This is not correct. According to the IRS, graduate students are not employees and have no business, and thus we have no business expenses (this reasoning also helps prevent graduate student unions -- you can't form a union if you aren't employed). And in any case, as I mentioned, we are specifically forbidden to write off the cost of doing research.

It's not just that the rules are confusing, they don't make sense. Why does the government want to tax students for the right to do research? How is that a good idea? Research benefits the public at large, and comes at a high opportunity cost for the researcher already (one could make more doing just about anything else). Why make us pay for it?

(It probably should be pointed out that Harvard could cough up the taxes itself, or they could administer the mini-grants as grants rather than as scholarships, though that would cost them more in terms of administrative overhead. Instead, Harvard specifically forbids using any portion of the mini-grant to pay for the incurred taxes. Though since they don't ask for any accounting, it's quite possible nobody pays any attention to that rule.)

Feds to College Students: "We don't want your professors to know how to teach"

The National Science Foundation just changed the rules for their 3-year graduate fellowships: no teaching is allowed. Ostensibly, this is to ensure that fellows are spending their time doing research. This is different from the National Defense Science & Engineering graduate fellowship Vow of Poverty: you can teach as much as you want, so long as you don't earn money from it.*

Consider that, ideally, PhD programs take 5 years, and the final year is spent on (a) writing the dissertation, and (b) applying for jobs. This means that NSF graduate fellows may have as little as one year in which to get some teaching experience.

Presumably, NSF was thinking one of three things:

1) They're trying to make it harder for their fellows to get jobs at universities that care about teaching.
2) They honestly don't believe teaching experience is important.
3) They weren't thinking at all.

I'm curious what will happen at universities that require all students to teach, regardless of whether they have outside fellowships or not. Will they change that rule, or will they forbid students to have NSF fellowships. Given the current financial situation, I'm guessing they'll go with the former, but it's hard to say.

*The exact NDSEG rule is that your total income for any year should be no more than $5,000 in addition to the fellowship itself. Depending on the university, this can be less than what one would get paid for teaching a single class.