Field of Science

Predicting my h-index

A new article in Nature presents a model for predicting a neuroscientist's future h-index based on current output. For those who don't know, the h-index is the largest N such that you have N papers each of which have at least N citations. It does this based on your current h-index, total number of published articles, years since first article, the total number of journals published in, and the number of papers in Nature, Science, Nature Neuroscience, PNAS and Neuron.

I'm not a neuroscientist (though I *am* in Neurotree, upon which the data are based), so I figured I'd see what it predicts for me. I ran into a problem right away, though: how do we count papers? Does my book chapter count? How about two editorials that were recently published (that is, non-empirical papers appears in empirical journals)? What about two papers that are in press but not yet published?

If we are conservative and count only empirical papers and only papers currently in print, my predicted h-index in 2022 is 12:

If we count everything I've published -- that is, including papers in press, book chapters, and non-empirical papers -- things improve somewhat (apparently it's a good thing that almost everything I publish is in different outlets):


Interestingly, if I pretend that I currently have an h-index of 9 (that is, all my papers have been cited at least 9 times), it doesn't actually do any good. I've increased my current h-index by 6 but only my predicted 2022 h-index by 3:

I guess the model has discovered regression to the mean.

(BTW I've noticed that neuroscientists really like h-index, probably because they cite each other so much. H-indexes in other fields, such as -- to take a random example -- psychology, tend to be much lower.)


Strange language fact of the day

Apparently in some languages/cultures, it is common to call an infant "Mommy". Even a boy infant. I am told by reliable sources this is true of Bengali and of Tzez. Reportedly Bengladeshi immigrants try to import this into English and get weird looks at the daycare.

I am told that this is actually relatively common and appears in many languages. And there are some phenomena in English that aren't so different. You can, for instance, say the pot is boiling when you in fact mean that the water in the pot is boiling, not the pot itself. You can ask for someone's hand in marriage, even though you probably want the entire person, not just the hand. So words can sometimes stand in for other words.

It still blows my mind, though. And I'd love to hear what a Whorfian had to say about it.

Around the Internet: What you missed last week (9/17/2012 edition)

Chomsky
OK, not technically last week, but here's a longish post critiquing Chomsky and a much longer, heated discussion in the comments, from BishopBlog.

Replication
A nice editorial on the important of replicating as a way of dealing with fraud.

The New Yorker Still Hates Science (esp. Evolution)
When I first heard the claim that the New Yorker was fundamentally anti-science, it came as a surprise. Then I thought back through what they publish, and it became less surprising. Now, reading this out-of-touch, anti-evolution tirade isn't surprising at all (my favorite part is where Gottlieb writes that understanding evolution is superfluous and a waste of time).

Pricing conundrum

Before I went to Riva del Garda for this year's AMLaP, I picked up a travel guide on my Kindle. (If only such things had existed the years I backpacked in Eurasia. My strongest memories are of how much my backpack weighed. Too many books.)

Oddly, the Lonely Planet Italian Lakes Region guide is the exact same price as the whole Italy guide. These local guides tend to be glorified excerpts of the country book, and since they both weigh the same...

Estimating replication rates in psychology

The Open Science Collaboration's interim report, which will come out shortly in Perspectives in Psychological Science, is available. We nearly pulled off the physics trick of having a paper where the author list is longer than the paper itself. I think there are nearly 70 of us (if you scroll down, you'll find me in the H's).

The abstract says it all:
Reproducibility is a defining feature of science. However, because of strong incentives for innovation and weak incentives for confirmation, direct replication is rarely practiced or published. The Reproducibility Project is an open, large-scale, collaborative effort to systematically examine the rate and predictors of reproducibility in psychological science. So far, 72 volunteer researchers from 41 institutions have organized to openly and transparently replicate studies published in three prominent psychological journals from 2008. Multiple methods will be used to evaluate the findings, calculate an empirical rate of replication, and investigate factors that predict reproducibility. Whatever the result, a better understanding of reproducibility will ultimately improve confidence in scientific methodology and findings.
If you are interested in participating, there is still time. Go to the website for more information. 

Around the Internet - 8/31


Publication
A warning about the perils of preprint repositories.

Statistical evidence that writing book chapters isn't worth the effort. (Though caveat: the author also doesn't find evidence of higher citation rates for review papers in journals, which I had thought was well-established.)

One person who finds things to like in the publication process (I know, I don't link to these often).

Neuroskeptic argues that we don't necessarily want to increase replication, just replicability. (Agreed, but how do we know if replicability rates are high enough without conducting replications?)

Language
Did Chris Christie really talk about himself too much in Tampa? 

Other Cognitive Science
Cognitive load disrupts implicit theory of mind processing. So maybe the reason young children succeed at implicit tasks isn't because those tasks don't require executive processing (whether they require less is still up for grabs).

Lying with statistics

One of the most concise explanations of why your units of measurement matter, courtesy of XKCD:


Revision, Revision, Revision

I have finally been going through the papers in the Frontiers Special Topic on publication and peer review in which my paper on replication came out. One of the arguments that appears in many of these papers (like this one)* -- and many discussions of the review process, is that when papers are published, they should be published along with the reviews.

My experience with the process -- which I admit is limited -- is that you submit a paper, reviewers raise concerns, and you only get published if you can revise the manuscript so as to address those concerns (which may include new analyses or even new experiments). At that stage, the reviews are a historical document, commenting on a paper that no longer exists. This may be useful to historians of science, but I don't understand how it helps the scientific process (other than, I suppose, transparency is a good thing).

So these proposals only make sense to me if it is assumed that papers are *not* typically revised in any meaningful way based on review. That is, reviews are more like book reviews: comments on a finished product. Of my own published work, three papers were accepted more-or-less as is (and frankly I think the papers would have benefited from more substantial feedback from the reviewers). So there, the reviews are at least referring to a manuscript very similar to the one that appeared in print (though they did ask me to clarify a few things in the text, which I did).

Other papers went through more substantial revision. One remained pretty similar in content, though we added a whole slew of confirmatory analyses that were requested by reviewers. The most recent paper actually changed substantially, and in many ways is a different -- and much better! -- paper than what we originally submitted. Of the three papers currently under review, two of them have new experiments based on reviewer comments, and the other one has an entirely new introduction and general discussion (the reviewers convinced me to re-think what I thought the paper was about). So the reviews would help you figure out which aspects of the paper we (the authors) thought of on our own and which are based on reviewer comments, but even then that's not quite right, since I usually get comments from a number of colleagues before I make the first submission. There are of course reviews from the second round, but that's often just from one or two of the original reviewers, and mostly focuses on whether we addressed their original concerns or not.

So that's my experience, but perhaps my experience is unusual. I've posted a poll (look top right). Let me know what your experience is. Since this may vary by field, feel free to include comments to this post, saying what field you are in.

---
*To be fair, this author is describing a process that has actually been implemented for a couple Economics journals, so apparently it works to (at least some) people's satisfaction.

Have you seen me before?

I have been using PCA to correct blink artifact in an EEG study that I am presenting at AMLaP in a couple weeks. Generally, I think I've gotten pretty good at detecting blinks. I do see other things that look like artifact but which I don't understand as well. For instance, look at this channel plot:
(You should be able to increase the size of the picture by opening it in a new window). So this looks a bit like a blink, but it's in the wrong place entirely. This is a 128 electrode EGI cap, with the electrodes listed sequentially (the top one is electrode 1 and the bottom is electrode 124 -- I don't use electrodes 125-128 because they tend not to have good contact with the skin).

The way EGI is laid out, the low-numbered electrodes and high-numbered electrodes are in the front, whereas the middle-numbered electrodes are in the back (check this picture), So basically what I'm seeing is being generated in the back of the head. Actually, the back left of the head, according to my PCA:
In this, the top left panel shows the localization of the signal. The top right panel shows which trials the signal occurred in. The power spectrum (bottom panel) is also quite odd. I'm going ahead and removing this component, because it's clearly artifact (the amplitude looks way too large to be true EEG), and it affects so many trials that I can't just exclude them without excluding the participant. But I'd really like to know what this is. Because maybe I *should* be excluding the participant.

So...has anyone seen something like this before?

For those wondering...

Using PCA, I was able to get rid of this artifact fairly cleanly. Here's is an imagine before removal, with the 124 electrodes stacked on one another:
You can see that strange artifact -- which looks like a blink but not quite as smooth as your typical blink -- very easily in these four trials.

Here are the same four trials after I subtracted that component, plus another component that probably is blink-related (there were two, classic-looking blinks in my data; the component above found both of those *and* those two blinks; the other component found only the two classic blinks):
You can see that the odd artifact is gone from all four trials, both otherwise, things look very similar.

Around the Internet - 7/30/2012


Citations

There have been a bunch of posts lately on citations and the Impact Factor. I started with these two posts by DrugMonkey. These posts have links to others in the chain, which you can follow. Here's a slightly older post (from late July) on reasons to self-cite.

Next topic

So I didn't actually see anything else interesting this week. Possibly because I've been trying to streamline a bootstrapping analysis (which I may blog about when I finally get it done). Early in the process, I tried to estimate how long it would take for the script to run and realized it was about 1 week for each analysis, of which I have several to do. So I started hurriedly looking for ways to speed it up...

New source of post-doctoral funding

NSF has just announced what appears to be a new post-doctoral fellowship. The document linked to lists two different tracks: Broadening Participation and Interdisciplinary Research in Behavioral and Social Sciences. It is the second one that seems to be new. Here's the heart of the description:
Track 2. Interdisciplinary Research in Behavioral and Social Sciences (SPRF-IBSS): The SPRF-IBSS track aims to support interdisciplinary training where at least one of the disciplinary components is an SBE science ... The proposal must be motivated by a compelling research question (within the fields of social, behavioral and economic sciences) that requires an interdisciplinary approach for successful investigation. As a result, applicants should demonstrate the need for new or additional skills and expertise beyond his or her core doctoral experience to achieve advances in the proposed research. To acquire the requisite skills and competencies (which may or may not be within SBE sciences), a mentor in the designated field must be selected so that the postdoctoral research fellow and his or her mentor will complement, not reinforce, each other's expertise. 

What I get from this is that the fellowship will be particularly useful for someone with training in one field who wants to get cross-trained in another. Thinking close to home, this might be a psycholinguist who wants training in linguistics or computer science. This makes me think of the legendary IGERT program at UPenn, which trained a string of linguists to use psychological research methods, many of whom are now among my favorite researchers. Which is to say that this cross-training can be very productive.


Another bunch of retractions

It appears that a series of papers, written by a German business professor, are being retracted. This particular scandal doesn't seem to involve data fabrication, though. Instead, he is accused of double-publishing (publishing the same work in multiple journals) and also of making errors in his analyses (this lengthy article -- already linked to above -- discusses the issues in detail).

It's possible that I was not paying attention before, but there seems to be more publication scandals lately than I remember. When working on my paper about replication, I actually had to look pretty hard to find examples of retracted papers in psychology. That wouldn't be so difficult at the moment, after Hauser, Smeeters and Sanna.

If there is an increase, it's hopefully due not to an increase in fraud but an increase in vigilance, given the attention the issue has been getting lately.

Making up your data

Having finished reading the Simonsohn paper on detecting fraud, I have come to two conclusions:

1. Making up high-quality data is really hard. Part of the problem with making up data is that you have to introduce some randomness into it. If your study involves asking people how much they are willing to pay for a black t-shirt, you can't just write down that they all were willing to pay the average (say $12). You have to write down some variation ($12, $14, $7, $9, etc.).

The problem is that humans are notoriously bad at generating random number sequences. Simonsohn discusses this in terms of Tversky and Kahneman's famous, tongue-in-cheek paper "Belief in the law of small numbers." People think that random sequences should look roughly "average", even if the sample is small: Flip a coin 4 times, you should get 2 heads and 2 tails, when in fact getting 4 heads isn't all that improbable.

So your best bet, if you are making up data, is to use a computer program to generate it from your favorite distribution (the normal distribution would be a good choice in most cases). The problem is that data can have funny idiosyncrasies. One of the problems with the string of numbers I suggested above ($12, $14, $7, $9, etc.) is that humans like round numbers. So when people say what they are willing to pay for a t-shirt, what you should see is a lot of $10s, $20s and maybe some $5s and $15s. The numbers in my list are relatively unlikely.

The paper goes on to describe other problems as well. What I get from this is that making up data in a way that is undetectable is a lot of work, and you might as well actually run the study. So even leaving aside other reasons you might want to not commit fraud (ethics, desire for / belief in importance of knowledge, etc.), it seems sheer laziness alone should steer you the other direction.

2. The Dark Knight Rises is awesome. Seriously. Technically there was nothing about that in the paper, but I was thinking about the movie while reading the paper. Since I saw the show this morning, it's been hard to think of anything else. The most negative thing I can say about it is that it wasn't better than the last one, which is grading on a pretty steep curve.

Detecting fraud

Uri Simonsohn has now posted a working paper describing how he detected those two recent cases of data fraud. Should my other writing projects progress fast enough, I'll write about it soon. I'll also post links to any interesting discussions I come across.