Games with Words

Broken but not yet Dead

Posted by GamesWithWords on Monday, November 29, 2010

I became fairly ill on my last trip to Russia in August. The disease itself was fairly nasty if generally treatable, though it came with a not insignificant chance of developing fatal complications. Meanwhile, it took me a day to convince any of my friends that I was sick enough that I needed to see a doctor (they all wanted me to take various berries or herbs instead). Having gotten one friend on board, it took him a day to find a hospital that was open (one was closed because of a power outage, and several were open but all the doctors were on vacation). I eventually got to a doctor who gave me the necessary meds. Within a few days my fever was low enough I could get around reasonably well, and though I still felt like shit for a few weeks after that, I was able to fly home on schedule.

I was reminded of this story by Dr. Isis's harrowing account of her recent, nasty bout of mosquito-born infection. Her story is much more compelling than mine (one reason I didn't have a full post on mine before) and worth reading in its own right. What I picked up on in particular was the following:

Health care in the United States might be broken, but at least we have health care. I spent the last two weeks teaching medical school in a country where much of the population doesn't have access to running water and access to fresh food is limited. 41% of children under four are iron deficient. There are 60 times more low birth weight infants per capita than in the United States. There is a hospital in the capitol city, but no CT, MRI, or dialysis. It has two intensive care beds. Nine ambulances service the entire country. Medical record keeping is problematic and there is a shortage of technicians, doctors, and nurses.

That's absolutely true. It's also a reminder, though, that things broken -- if left without repairs too long -- eventually decay away. Right now it is nice that our (American) health care system is still better than that in the developing world ... but it's worrisome that it's not as good as that in the rest of the developed world. If we wait long enough without fixing it we may wake up one day and find that we are no longer in the developed world.

If this seems far-fetched, consider that among developed nations, we're in the middle or back of the pack in health care, primary education, income equality and especially Internet infrastructure. In most of these areas (perhaps not primary education) we've beens steadily losing ground for decades (we're also losing ground in fields where we're still technically ahead, like science). If that continues, we will eventually be left behind.

Apply to Graduate School?

Posted by GamesWithWords on Friday, November 12, 2010

Each year around this time, I try to post more information that would be of use to prospective graduate students, just in case any such are reading this blog (BTW Are there any undergraduates reading this blog? Post in the comments!).

This year, I've been swamped. I've been focusing on getting a few papers published, and most of my time for blogging has gone to the Scientific-American-Mind-article-that-will-not-die, which, should I ever finish it, will probably come out early next year.

Luckily, Female Science Professor has written a comprehensive essay on The Chronicle of Higher Education about one of the most confusing parts of the application process: the pre-application email to a potential advisor. Everyone tells applicants to send such emails, but nobody gives much information about what should be in them. Find the essay here.

I would add one comment to what she wrote. She points out that you should check the website to see what kind of research the professor does rather than just asking, "Can you tell me more about your research," which comes across as lazy. She also suggests that you should put in your email whether you are interested in a terminal master's. Read the website before you do that, though, since not all programs offer terminal master's (none of the programs I applied to do). Do your homework. Professors are much, much busier than you are; if you demonstrate that you are too lazy to look things up on the Web, why should they spend time answering your email?

---
For past posts on graduate school and applying to graduate school, click here.

Understanding and Curing Myopic Voting

Posted by GamesWithWords on Thursday, November 11, 2010

The abstract from a recent talk by Gabriel Lenz of MIT:

Retrospective voting is central to theorizing about democracy. Given voters’ ignorance about politics and public policy, some argue that it is democracy's best defense. This defense, however, assumes citizens are competent evaluators of incumbent politicians' performance. Although little research has investigated this assumption, voters' retrospective assessments in a key domain, the economy, appear flawed. They overweight election-year income growth in presidential elections, ignoring cumulative growth under the incumbent. In this paper, I present evidence that this myopia arises from a more general “end bias” in retrospective assessments. Using a three-year panel survey, I show that citizens' memories of the past economy are inconsistent with their actual experience of the economy as they reported it in earlier interviews. They fail to remember the past correctly in part because the present shapes their perceptions of the past. I then show similar behavior in the lab. When participants evaluate economic and crime data, I again find that election-year performance shapes perceptions of overall performance, even under conditions where the election year should not be more informative. Finally, I search for and appear to find a cure. Presenting participants with cumulative information on performance (e.g., total income growth or total rise in murders during incumbents’ terms) cures this myopia. On one hand, these results are troubling for democracy because they confirm citizens’ incompetence at retrospection. On the other hand, they point to a remedy, one that candidates and the news media could adopt.

That's a remedy as long as the candidates and news media don't simply lie about the fact. Good luck with that one.

Bad News for Science Funding

Posted by GamesWithWords on Tuesday, November 09, 2010

NIH expects to have to cut the percentage of grant applications that are funded from 20% to a historic low of 10%. Let's point of for the moment that 20% was not very high, but 10% is rough. The expected outcome is that some labs will close, and those that don't will have to do less research, if for no other reason than that they will spend more time writing grants and less time doing real work (guess who pays the researcher's salaries while they write grants: NIH. So this also means that less of the money in the remaining grants will go to actual research).

The reason for the expected cutback is the Republican vow to cut discretionary civilian spending to 2008 levels. I understand living within one's means. I have a fairly frugal household (in graduate school, my wife attended a university-sponsored seminar on how to manage on a graduate student budget, only to discover that the recommended "austerity" budget was considerably more lavish than ours; we promptly started eating out more). But focusing on discretionary spending seems like someone $100,000 in debt clipping coupons: it's maybe good PR but as a solution to the problem, it's hopeless. This graph says it all:

Go ahead and cut all discretionary spending: you get a 16% reduction in the budget (which is in the neighborhood of our current deficit) at considerable cost. So maybe the coupon example isn't the right one. This is someone who, with a $100,000 debt, lets his teeth rot in his mouth because he's saving money on toothpaste.

New Language Experiment for Bilinguals

Posted by GamesWithWords on Monday, November 08, 2010

I'm not sure I've ever blogged about a conference past the first day. I'm usually too tired by the second day. BUCLD is particularly grueling, running over 12 hours on the first day and near 12 hours on the second. Plus the parties.

I do want to point folks to one thing: Thomas Roeper, Barbara Zurer Pearson and Margaret Grace, all of the University of Massachusetts, are running an interesting study on quantifiers (words like all, some, each, and most). One interesting thing about this study is that while language researchers very often exclude non-native speakers and bilinguals, the researchers are very interested in comparing results from native and non-native speakers of English. Right now, they're looking for people who learned some language other than English prior to learning English.

The study is here. They are particularly interested right now in getting data from non-native English speakers. There is a raffle that participants can win (details are on the site).

Boston University Conference on Language Development: Day 1

Posted by GamesWithWords on Friday, November 05, 2010

BUCLD is one of my favorite conferences, not least of which because it takes place every year just across the river. This year has been shaping up to be a particularly good year, if the first day is any indication.

Ben Ambridge (w/Julien Pine & Caroline Rowland) gave an excellent talk on learning semantic restrictions on verb alternations. Of all the work Steve Pinker has done, I think his verb alternation work is the least well-known, but it's also probably my favorite work, and it's nice to see someone systematically revisiting these issues, and I think Ambridge is making some important contributions.

Kenny Smith (w/Elizabeth Wonnacott) presented a really neat proof-of-concept involving language evolution, showing that you can get robust regularization of linguistic systems in a community of speakers even if none of the individual learners/speakers have strong biases to regularize the input. This was a really fun talk; one of those talks that makes one reconsider one's life choices ("should I be studying language evolution?").

Dea Hunsicker (w/Susan Goldin-Meadow) presented new analyses of an old home-sign corpus, looking at evidence that this particular home sign had noun phrases. Home-sign, for those who don't know it, is an ad-hoc mini sign language often developed by deaf children who don't have exposure to a developed sign language.

If I had to pick a best talk, I'd pick Erin Conwell's talk (w/Tim O'Donnell & Jesse Snedeker) on the dative alternation, in which she sketched an explanation of why, although double-object constructions are overall more frequent that prepositional-object constructions, the latter seem to be more productive in early child language. But I may be biased here in that Erin is a post-doc in the same lab as me.

There were a number of other good talks today that I saw -- and many that I didn't -- which deserve mention. I'd write more, but it's late, and there's another full day coming up tomorrow.

Seriously ambiguous pronouns

Posted by GamesWithWords on Thursday, November 04, 2010

The intro to Terminator: The Sarah Conner Chronicles goes something like:

In the future, my son will lead humanity in the war against Skynet, a computer system programmed to destroy the world. It has sent machines back through time, some to kill him, one to protect him.

The only reading I get on this is that "it" refers to Skynet, and thus Skynet has sent machines back to kill John Conner as well as protect him. So I'm only a few episodes into Season I on Netflix Instant, so perhaps I'm about to find out that Skynet is playing some weird kind of Robert Jordan game, but I suspect rather the writers wanted "it" to refer to "the war". I can get that reading if I squint, but it seems incredibly unnatural.

Vote!

Posted by GamesWithWords on Monday, November 01, 2010

The best thing I can say about the last two years is that Democrats have made real investments in science. After eight years of stagnant or falling funding, it was like a breath of fresh air.

Luckily, Republicans are back to suck the air (and life) out of us again. After the complete clusterfuck that was the Bush administration, I don't know why anyone would be willing to call themselves a Republican, much less vote for one. But if I knew everything about human nature, I wouldn't have to run experiments.

I wish Obama and the Dems had been doing more to fix up the wreckage left behind by Bush, but at least they don't seem hell-bent at destroying the economy. I hope you all enjoyed the respite.

In the meantime, vote. Just in case.

----

For previous posts and more details on Republican and Democratic science policies, read this, this, this and this, among others.

Does Global Warming Exist, and Other Questions We Want Answered

Posted by GamesWithWords on Friday, October 29, 2010

This week, I asked 101 people on Amazon Mechanical Turk both whether global temperatures have been increasing due to human activity AND what percentage of other people on Amazon Mechanical Turk would say yes to the first question. 78% agree with the answer to the first question. Here's the answers to the second, broken down by whether the respondent did or did not believe in man-made global warming:

Question: How many other people on Amazon Mechanical Turk believe global temperatures have been increasing due to human activity?

Average 1st Quartile-3rd Quartile

Believers 72% 60%-84%

Denialists 58% 50%-74%

Correct 78% ------

Notice that those who believe global warming is caused by human activity are much better at estimating how many other people will agree than are those who do not. Interestingly, the denialists' answer is much closer to the average of all Americans, rather than Turkers (who are mostly but not exclusively American, and are certainly a non-random sample).

So what?

Why should we care? More importantly, why did I do this experiment? A major problem in science/life/everything is that people disagree about the answers to questions, and we have to decide who to believe. A common-sense strategy is to go with whatever the majority of experts says. There are two problems, though: first, it's not always easy to identify an expert, and second, the majority of experts can be wrong.

For instance, you might ask a group of Americans what the capital of Illinois or New York is. Although in theory, Americans should be experts in such matters (it's usually part of the high school curriculum), in fact the majority answer in both cases is likely to be incorrect (Chicago and New York City, rather than Springfield and Albany). This was even true in a recent study of, for instance, MIT or Princeton undergraduates, who in theory are smart and well-educated.

Which of these guys should you believe?

So how should we decide which experts to listen to, if we can't just go with "majority rules"? A long chain of research suggests an option: ask each of the experts to predict what the other experts would say. It turns out that the people who are best at estimating what other people's answers will be are also most likely to be correct. (I'd love to cite papers here, but the introduction here is coming from a talk I attended earlier in the week, and I don't have the the citations in my notes.) In essence, this is an old trick: ask people two questions, one of which you know the answer to and one of which you don't. Then trust the answers on the second question that come from the people who got the first question right.

This method has been tested on a number of questions and works well. It was actually tested on the state capital problem described above, and it does much better than a simple "majority rules" approach. The speaker at the talk I went to argued that this is because people who are better able to estimate the average answer simply know more and are thus more reliable. Another way of looking at it though (which the speaker mentioned) is that someone who thinks Chicago is the capital of Illinois likely isn't considering any other possibilities, so when asked what other people will say guesses "Chicago." The person who knows that in fact Springfield is the capital probably nonetheless knows that many people will be tricked by the fact that Chicago is the best-known city in Illinois and thus will correctly guess lots of people will say Chicago but that some people will also say Springfield.

Harder Questions

I wondered, then, how well it would work on for a question where everybody knows that there are two possible answers. So I surveyed Turkers about Global Warming. Believers were much better at estimating how many believers there are on Turk than were denialists.

Obviously, there are a few ways of interpreting this. Perhaps denialists underestimate both the proportion of climate scientists who believe in global warming (~100%) and the percentage of normal people who believe in global warming, and thus they think the evidence is weaker than it is. Alternatively, denialists don't believe in global warming and thus have trouble accepting that other people do and thus lower their estimates. The latter proposal, though, would suggest that believers should over-estimate the percentage of people who believe in global warming, though that is not in fact the case.

Will this method work in general? In some cases, it won't. If you asked expert physicists in 1530 about quantum mechanics, presumably none of them would believe it and all would correctly predict that none of the other would believe it. In other cases, it's irrelevant (near 100% of climatologists believe in man-made global warming, and I expect they all know that they all believe in it). More importantly, the method may work well for some types of questions and not others. I heard in this talk that researchers have started using the method to predict product sales and outcomes of sports matches, and it actually does quite well. I haven't seen any of the data yet, though.

------

For more posts on science and politics, click here and here.

Did your genes make you liberal?

Posted by GamesWithWords on Friday, October 29, 2010

"The new issue of the Journal of Politics, published by Cambridge University, carries the study that says political ideology may be caused by genetic predisposition."
--- RightPundits.com

"Scientists find 'liberal gene.'"
--- NBC San Diego

"Liberals may owe their political outlook partly to their genetic make-up, according to new research from the University of California, San Diego, and Harvard University. Ideology is affected not just by social factors, but also by a dopamine receptor gene called DRD4."
-- University press release

As in the case yesterday of the study about sisters making you happy, these statements are all technically true (ish -- read below) but deeply misleading. The study in question looks at the effects of number of friends and the DRD4 gene on political ideology. Specifically, they asked people to self-rate on a 5-point scale from very conservative to very liberal. They tested for the DRD4 gene. They also asked people to list up to 5 close friends.

The number of friends one listed did not significantly predict political ideology, nor did the presence or absence of the DRD4 gene. However, there was a significant (p=.02) interaction ... significant, but apparently tiny. The authors do not discuss effect size, but we can try to piece together the information by looking at the regression coefficients.

An estimated coefficient means that if you increase the value of the predictor by 1, the outcome variable increases by the size of the coefficient. So imagine the coefficient between the presence of the gene and political orientation was 2. That would mean that, on average, people with the gene score 2 points higher (more liberal) on the 5-point political orientation scale.

The authors seem to be reporting standardized coefficients, which means that we're looking at increasing values by one standard deviation rather than by one point. The coefficient of the significant interaction 0.04. This means that roughly as the number of friends and presence of the gene increase by one standard deviation, political orientation scores increase by 0.04 standard deviations. The information we'd need to correctly interpret that isn't given in the paper, but a reasonable estimate is that this means that someone with one extra friend and the gene would score anywhere from .01 to .2 points higher on the score (remember, 1=very conservative, 2=conservative, 3=moderate, 4=liberal, 5=very liberal).

The authors give a little more information:

For people who have two copies of the [gene], an increase in number of friendships from 0 to 10 friends is associated with increasing ideology in the liberal direction by about 40% of a category on our five-category scale.

People with no copies of the gene were unaffected by the number of friends they had.

None of what I wrote above detracts from the theoretical importance of the paper. Identifying genes that influence behavior, even just a tiny bit, is important as it opens windows into the underlying mechanisms. And to their credit, the authors are very guarded and cautious in their discussion of the results. The media reports -- fed, no doubt, by the university press release -- have focused on the role of the gene in predicting behavior. It should be clear that the gene is next to useless in predicting, for instance, who somebody is going to vote for. Does that make it a gene for liberalism? Maybe.

I would point out one other worry about the study, which even the authors point out. They tested a number of different possible predictors. The chances of getting a false positive increases with every statistical test you run, and they do not appear to have corrected for multiple comparisons. Even with 2,000 participants (which is a large sample), the p-value for the significant interaction was only p=.02, which is significant but not very strong, so the risk that this will not replicate is real. As the authors say, "the way forward is to seek replication in different populations and age groups."

Question: What are sisters good for?

Posted by GamesWithWords on Thursday, October 28, 2010

Answer: increasing your score on a 13-question test of happiness by 1 unit on one of the 13 questions.

A recent study of the effect of sisters on happiness has been getting a lot of press since it was featured on at the New York Times. It's just started hitting my corner of the blogosphere, since Mark Liberman filing an entry at Language Log early in the evening. On the whole, he was unimpressed. The paper didn't report data in sufficient detail to really get a sense of what was going on, so he tried to extrapolate based on what was in fact reported. His best estimate was that having a sister accounted for 0.4% of the variance in people's happiness.

This is a long way from the statement that "Adolescents with sisters feel less lonely, unloved, guilty, self-conscious and fearful", which is how ABC News characterized the study's findings, or "Statistical analyses showed that having a sister protected adolescents form feeling lonely, unloved, guilty, self-conscious and fearful", which is what the BUY press release said ... Such statements are true if you take "A's are X-er than B's" to mean simply that a statistical analysis showed that the mean value of a sample of X's was higher than the mean value of a sample of Y's, by an amount that was unlikely to be the result of sampling error.

Only an hour later, the ever wide-eyed Jonah Lehrer wrote

There's a surprisingly robust literature on the emotional benefits of having sisters. It turns out that having at least one female sibling makes us happier and less prone to depression...

I think this demonstrates nicely the added value of blogging, particularly science blogging. Journalists (like Lehrer) are rarely in a position to pick apart the methods of a study, whereas scientist bloggers can. I know many people miss the old media world, but the new one is exciting.

------
For more thoughts on science blogging, check this and this.

A Frog at the Bottom of a Well

Posted by GamesWithWords on Tuesday, October 26, 2010

My college had a graduate admissions counselor, with whom I consulted about applying to graduate school. Unfortunately, different fields (math, chemistry, literature, psychology) use completely different methods of selecting graduate students (and, in some sense graduate school itself is a very different beast depending on the field). My counselor didn't know anything about psychology, so much of the information I was given was dead wrong.

My graduate school also provides a lot of support for applying for jobs. This week, there is a panel on "The View from the Search Committee," which includes as panelists professors from Sociology, Romance Language & Literatures, and Organismic and Evolutionary Biology. That is, none of them are from Psychology. I do know that different fields recruit junior faculty in very different ways (for instance, linguistics practices a form of speed-dating at conferences as a first round of interviews, while others psych has no such system).

So...do I go? Keep in mind that I get lots of advice from faculty in my own department (and also from friends at other psych departments who have recently gone through the process). That is, how likely is it that the experience of these three professors will map on to the process I will actually go through? How likely is it that a one-hour panel can cover all the different variants of the process? How likely is it that there is information that would be relevant to anyone applying to any department that isn't obvious or something I am likely to already know?

Thoughts?

--------
The title of this post comes from an old proverb about a frog sitting at the bottom of a well, thinking that the patch of blue above is the whole world. Often (always?) we don't realize just how limited our own range of experience is.
photo: e_monk

Words and non-words

Posted by GamesWithWords on Sunday, October 24, 2010

"...the modern non-word 'blogger'..." -- Dr. Royce Murray, editor of the journal Analytic Chemistry.

"209,000,000 results (0.21 seconds)" -- Google search for the "non-word" blogger.

------------
There has been a lot of discussion about Royce Murray's bizzarre attack on blogging in the latest JAC editorial (the key sentence: I believe that the current phenomenon of "bloggers" should be of serious concern to scientists).

Dr. Isis has posted a nice take-down of the piece focusing on the age old testy relationship between scientists and journalists. My bigger concern with the editorial is that it is clear that Murray has no idea what a blog is, yet feels justified in writing an article about blogging. Here's a telling sentence:

Bloggers are entrepreneurs who sell “news” (more properly, opinion) to mass media: internet, radio, TV, and to some extent print news. In former days, these individuals would be referred to as “freelance writers”, which they still are; the creation of the modern non-word “blogger” does not change the purveyor.

Wrong! Wrong! Wrong! A freelance writer does sell articles to established media entities. Bloggers mostly write for their own blog (hence the "non-word" blog-ger). There are of course those who are hired to blog for major media outlets like Scientific American or Wired, but then they are essentially columnists (in fact, many of the columnists at The New York Times have NYTimes blogs at the request of the newspaper).

This magnifies, for the lay reader, the dual problems in assessing credibility: a) not having a single stable employer (like a newspaper, which can insist on credentials and/or education background) frees the blogger from the requirement of consistent information reliability ... Who are the fact-checkers now?

Wait, newspapers don't insist on credentials and don't fact-check the stories they get from freelancers? Why is Murray complaining about bloggers, then? In any case, it's not like journals like Analytic Chemistry do a good job of fact-checking what they publish or that they stop publishing papers by people whose results never replicate. Journal editors living in glass houses...

This focus on credentials is a bit odd -- I thought truth was the only credential a scientist needed -- and in any case seriously misplaced. I challenge Murray to find a popular science blog written by someone who is neither a fully-credentialed scientist writing about his/her area of expertise, nor a well-established science journalist working for a major media outlet.

Are there crack-pot bloggers out there? Sure. But most don't have much of an audience (certainly, their audience is smaller than the fact-checked, establishment media-approved Glenn Beck). Instead, we have a network of scientists and science enthusiasts discussing, analyzing and presenting science. What's to hate about that?

You're Wrong

Posted by GamesWithWords on Saturday, October 23, 2010

John Ioannidis has been getting a lot of press lately. He reached the cover of the last issue of The Atlantic Monthly. David Dobbs wrote about him here (and a few years ago, here). This is the doctor known for his claim that around half of medical studies are false -- that is about 80% of non-randomized trials and even 25% of randomized trials. These are not just dinky findings published in dinky journals; of 49 of the most highly-regarded papers published over a 13-year period, 34 of the 45 with that claimed to have found effective treatments had been retested, and 41% of those tests failed to replicated the original result.

Surprised?

Quoting the Atlantic Monthly:

Ioannidis initially thought the community might come out fighting. Instead, it seemed relieved, as if it had been guiltily waiting for someone to low the whistle...

Well, it's not surprising. The appropriate analog in psychology is the randomized trial, of which in medicine 25% turn out to be false according to this research (which hopefully isn't itself false). As Ioannidis has detailed, the system is set up to reward false positives. Journals -- particularly glamour mags like Science -- preferentially accept surprising results, and the best way to have a surprising result is to have one that is wrong. Incorrect results happen: "statistically significant" means "has only a 5% probability of happening by random chance." This means (in theory) that 5% of all experiments published in journals should reach the wrong conclusions. If journals are biased in favor of accepting exactly those 5%, then the proportion should be higher.

There are other factors at work. Some scientists are sloppier than others, and many of the ways in which one can be sloppy lead to significant and/or surprising results. For instance, 5% of experiments have false positives. There are labs that will run the same experiment 6 times with minor tweaks. There is a (1-.95^6) * 100 = 26.5% chance that one of those will have a significant result. The lab may then publish only that final experiments and not report the others. If sloppy results lead to high-impact publications, survival of the fittest dictates that sloppy labs will reap the accolades, get the grant money, tenure, etc.

Keep in mind that often many different labs are trying to do the same thing. For instance, in developmental psychology, one of the deep questions is what is innate? So many labs are testing younger and younger infants, trying to find evidence that these younger infants can do X, Y or Z. If 10 labs all run the same experiment, there's a (1-.95^10) * 100 = 40.1% chance of one of the labs finding a significant result.

Countervailing Forces

Thus, there are many incentives to publish incorrect data. Meanwhile, there are very few disincentives to doing so. If you publish something that turns out to replicate, it is very unlikely that anyone will publish a failure to replicate -- simply because it is very difficult to publish a failure to replicate. If someone does manage to publish such a paper, it will certainly be in a lower-profile journal (which is, incidentally, a disincentive to publishing such work to begin with).

Similarly, consider what happens when you run a study and get a surprising result. You could replicate it yourself to make sure you trust the result. That takes time, and there's a decent chance it won't replicate. If you do replicate it, you can't publish the replication (I tried to in a recent paper submission, and a reviewer insisted that I remove reference to the replication on account of it being "unnecessary"). If the replication works, you'll gain nothing. If it fails, you won't get to publish the paper. Either way, you'll have spent valuable time you could have spent working on a different study leading to a different paper.

In short, there are good reasons to expect that 25% of studies -- particularly in the high-profile journals -- are un-replicable.

What to do?

Typically, solutions proposed involve changing attitudes. The Atlantic Monthly suggests:

We could solve much of the wrongness problem, Ioannidis says, if the world simply stopped expecting scientists to be right. That's because being wrong in science is fine, and even necessary ... But as long as careers remain contingent on producing a stream of research that's dressed up to seem more right than it is, scientists will keep delivering exactly that.

I've heard this idea expressed elsewhere. In the aftermath of Hausergate, a number of people suggested that a factor was the pressure-cooker that is the Harvard tenure process, and that Harvard needs to stop putting so much pressure on people to publish exciting results.

So the idea is that we should stop rewarding scientists for having interesting results, and instead reward the ones who have uninteresting results? Journals should publish only the most staid research, and universities should reward tenure not based on the number of highly-cited papers you have written, but based on how many papers you've written which have never been cited? I like that idea. I can run a boring study in a few hours and write it up in the afternoon: "Language Abilities in Cambridge Toddlers are Unaffected by Presence or Absence of Snow in Patagonia." That's boring and almost certainly true. And no one will ever cite it.

Seriously, though, public awareness campaigns telling people to be more responsible are great, and sometimes they even help, but I don't know how much can be done without changing the incentive structure itself.

Reputation

I don't have a solution, but I think Ioannidis again points us towards one. He found that papers continue to be cited long after they have been convincingly and publicly refuted. I was discussing this issue with a colleague some time back and mentioned a well-known memory paper that nobody can replicated. Multiple failures-to-replicate have been published. Yet I still see it cited all the time. The colleague said, "Wow! I wish you had told me earlier. We just had a student spend two years trying to follow up that paper, and the student just couldn't get the method to work."

Never mind that researchers rarely bother to replicate published work -- even if they did, we have no mechanism for tracking which papers have been successfully replicated and which papers can't be replicated.

Tenure is awarded partly on how often your work has been cited, and we have many nice, accessible databases that will tell you how often a paper has been cited. Journals are ranked by how often their papers are cited. What if we rewarded researchers and journals based on how well their papers hold up to replication? Maybe it would help, maybe it wouldn't, but without a mechanism for tracking this information, this is at best an intellectual enterprise.

Even if such a database wasn't ultimately useful in decreasing the number of wrong papers, at least we'd know which papers were wrong.

Darn You, Amazon

Posted by GamesWithWords on Thursday, October 21, 2010

For a while now, my department has had problems with packages going missing. A suspiciously large number of them were sent by Amazon. A couple weeks ago, our building manager started to get suspicious. He emailed the department:

Today I received what is now the third complaint about problems with shipping of products at Amazon. I don't know which courier they were using, but the packages were left on the [unmanned] security desk in the 1st floor lobby ... In another recent case, the packages were dumped in front of the Center Office door while I was out. Interestingly, tracking showed that they were signed for me at a time that I was attending a meeting ... it's happened a few times. Usually the packages have simply been mis-delivered ... and turn up about a week later.

Figure 1. A prototypical, over-packaged Amazon box.

Some days later, he followed up with more information. Another department denizen noted that Amazon has started using various different couriers. She wrote "The other day I ordered 2 books and one came via FedEx and one came via UPS." The building manager noted that FedEx has started outsourcing delivery to UPS. He continued:

What's odd is that we get shipments via UPS and FedEx all the time. Usually, it's the same drivers ... We know some of them by name.

He concluded that perhaps Amazon (and UPS and FedEx) were starting to use a variety of subcontractors who don't understand how to deliver packages at large buildings (e.g., you can't just leave them in a random corner of the lobby).

Yesterday, we got a follow-up on the story. The building manager ordered a package from Amazon to see what would happen. The building manager was on his way to lunch when he spotted a van marked "package delivery" and an un-uniformed courier. The courier was leaving the building sans package, so the building manager knew the package was incorrectly delivered (he obviously hadn't signed for it)!. He tried to explain to the courier building package policies but

He was very polite, but did not speak much English, so I'm not sure just how much he took away from our little chat.

The building manager -- tired of dealing with lost and mis-delivered packages -- is on a mission to get someone from Amazon to care:

Calling them on the phone was unsatisfactory. Everyone in any position of authority is thoroughly insulated from public accountability.

Perhaps. But that's why blogs exist. Seriously, Amazon, do something about this.

photo: acordova

Field of Science