Field of Science

Overheard: Scientific Prejudice

A senior colleague recently attended an Autism conference. Language is frequently impaired in Autism, and so many of the neuroscientists there were trying to look at the effects of their animals models of Autism on language.

Yes, you read that correctly: animal models of language. In many cases, rats.

This colleague and I both believe in some amount of phylogenetic continuity: some aspects of language are no doubt built on mechanisms that existed in our distant ancestors (and therefore may exist in other modern-day animals). But given that we have, at best, a rudimentary understanding of the mechanisms underlying language in humans -- and certainly little or no agreement in the field at present -- arguing that a particular behavior in a rat is related to some aspect of language is at best wild-eyed conjecture (and I say this with a great deal of respect for the people who have been taking this problem seriously).

Unfortunately, this colleague didn't get very far in discussing these issues with these researchers. One, for instance, said, "I know rat squeaks are related to language because they're auditory!"

Sure, so's sneezing:

The problem with such conversations, as this colleague pointed out, is that neuroscientists often don't take us cognitive types seriously. After all, they work on a "harder" science. (For those who haven't seen it yet, read this by DrugMonkey -- tangential but fun.) A friend of mine, who is a physicist, once told me that he wasn't sure why psychology was even called a "science" since psychologists don't do experiments -- never mind that I was IMing him from my lab at the time (which he knew).

When I applied to graduate school, I applied to one interdisciplinary program that included cognitive people and also physiology folk. During my interview with one professor who worked on monkey physiology, he interrupted me as I was describing the work I had done as an undergraduate. "Nothing of value about language," he told me, "can be learned by studying humans." When I suggested that perhaps there weren't any good animal models of language to work with, he said, "No, that's just a prejudice on the part of you cognitive people."

Keep in mind that there were several faculty in his department who studied language in humans. This is why such mixed departments aren't always particularly collegial places to work.

I bring this up not to rag on neuroscientists or physicists, but to remind the psychologists in the audience that we have this exact same problem. I don't know how many people have told me that linguistics is mostly bullshit (when I was an undergraduate, one professor of psychology told me: "Don't study linguistics, Josh. It will ruin you as a scientist.") and that philosophy has nothing to offer. When you talk to them in detail, though, you quickly realize that most of them have no idea what linguists or philosophers do, what their methods or, or why those fields have settled on those methods. And that's the height of arrogance: linguists and philosophers incude, in their numbers, some of the smartest people on the planet. It only stands to reason that they know something of value.

I'm not defending all the methods used by linguists. They could be improved. (So could methods used by physicists, too.) But linguists do do experiments, and they do work with empirical data. And they're damn smart.

Just sayin'.

Photos: mcfarlando, Jessica Florence.

Masked review?

I just submitted a paper to the Journal of Experimental Psychology: General. Like many journals, this journal allows masked review -- that is, at least in theory, the reviewers won't know who you are.

On the whole, I'm not sure how useful blind review is. If you're pushing an unpopular theoretical position, I expect reviewers will jump on you no matter what. If you're simply personally so unpopular that no one will sign off on your work, you have problems that masked review won't solve.

But the real reason I chose not to use blind review was laziness -- it's a pain to go through the manuscript and remove anything that might give away who you are (assuming this is even possible -- for instance, if you cite your own unpublished work a lot, that's going to give away the game and there's not much you can do about that, except not cite that work).

But I'm curious how other people feel about this. Do you usually request masked review? Those who have reviewed papers, do you treat masked papers differently from signed ones?

photo: Ben Fredericson (xjrlokix)

Thank you, Amazon!

As regular readers know, I've been brushing up my CV in anticipation of some application deadlines. This mostly means trying to move papers from the in prep column to the in submission column. (I'd love to get some new stuff into the in press column, but with the glacial pace of review, that's unlikely to happen in the time frame I'm working with).

This means, unfortunately, I'm working during a beautiful Saturday morning. This would be more depressing if it weren't for the wonder that is Amazon Mechanical Turk. I ran two experiments yesterday (48 subjects each), have another one running right now, and will shortly put up a fourth. The pleasure of getting new data -- of finding things out -- is why I'm in this field. It's almost as fun as walking along the river on a beautiful Saturday morning.


How I feel about Amazon Mechanical Turk -- the psycholinguist's new best friend.

photo: Daniel*1977

Spam Filter

Blogger has helpfully added some advanced spam detection for comments. One interesting feature is that I still get an email saying a comment has been left even if the comment is flagged as spam and isn't posted. This makes it a little harder for me to moderate than you might wish.

So if your post has been flagged as spam, either be patient and wait until I discover it, or send me an email directly and I'll un-flag it.


I wasn't able to get Edward's suggestion to work, but I did sit down and paste all my posts back to 4/27/2010 into Wordle. In this more representative sample, it seems people actually beats out word and language, though probably the combination of word and words would win if Wordle could handle inflectional morphology.

Negative Evidence: Still Missing after all these Years

My pen-pal Melodye has posted a thought-provoking piece at Child's Play on negative evidence. As she rightly points out, issues of negative evidence have played a crucial role in the development of theories of language acquisition. But she doesn't think that's a good thing. Rather, it's "ridiculous, [sic] and belies a complete lack of understanding of basic human learning mechanisms."

The argument over negative evidence, as presented by Melodye, is ridiculous, but that seems to stem from (a) conflating two different types of negative evidence, and (b) misunderstanding what the argument was about.
Fig. 1. Melodye notes that rats can learn from negative evidence, so why can't humans? We'll see why.

Here's Melodye's characterization of the negative evidence argument:
[T]he argument is that because children early on make grammatical ‘mistakes’ in their speech (e.g., saying ‘mouses’ instead of ‘mice’ or ‘go-ed’ instead of ‘went’), and because they do not receive much in the way of corrective feedback from their parents (apparently no parent ever says “No, Johnny, for the last time it’s MICE”), it must therefore beimpossible to explain how children ever learn to correct these errors. How — ask the psychologists — could little Johnny ever possibly ‘unlearn’ these mistakes? This supposed puzzle is taken by many in developmental psychology to be one of a suite of arguments that have effectively disproved the idea that language can be learned without an innate grammar.
What's the alternative? Children are predicting what word is going to come up in a sentence.
[I]f the child is expecting ‘mouses’ or ‘gooses,’ her expectations will be violated every time she hears ‘mice’ and ‘geese’ instead.  And clearly that will happen a lot.  Over time, this will so weaken her expectation of ‘mouses’ and ‘gooses,’ that she will stop producing these kinds of words in context.
I can't speak for every Nativist, or for everyone who has studied over-regularization, but since Melodye cites Pinker extensively and specifically, and since I've worked on over-regularization within the Pinkerian tradition, I think I can reasonably speak for at least a variant of Pinkerianism. And I think Melodye actually agrees with us almost 100%.

My understanding of Pinker's Words and Rules account -- and recall that I published work on this theory with one of Pinker's students, so I think my understanding is well-founded -- is that children originally over-regularize the plural of mouse as mouses, but eventually learn that mice is the plural of mouse by hearing mice a lot. That is, our account is almost identical to Melodye's except it doesn't include predictive processing. I actually agree that if children are predicting mouses and hear mice, that should make it easier to correct their mistaken over-regularization. But the essential story is the same.

Where I've usually seen Nativists bring up this particular negative evidence argument (and remember there's another) is in the context of Behaviorism, on which rats (and humans) learned through being explicitly rewarded for doing the right thing and explicitly punished for doing the wrong thing. The fact that children learning language are almost never corrected (as Melodye notes) is evidence against that very particular type of Empiricist theory.

That is, we don't (and to my knowledge, never have) argued that children can only learn the word mice through Universal Grammar. Again, it's possible (likely?) that someone has made that argument. But not us.[1]

Negative Evidence #2

There is a deeper problem with negative evidence that does implicate, if not Universal Grammar, at least generative grammars. That is, as Pinker notes in the article cited by Melodye, children generalize some things and not others. Compare:

(1) John sent the package to Mary.
(2) John sent Mary the package.
(3) John sent the package to the border.
(4) *John sent the border the package.

That * means that (4) is ungrammatical, or at least most people find it ungrammatical. Now, on a word-prediction theory that tracks only surface statistics (the forms of words, not their meaning or syntactic structure), you'd probably have to argue that whenever children have heard discussions of packages being sent to Mary, they've heard either (1) or (2), but in discussions of sending packages to borders, they've only ever heard (3) and never (4). This is surprising, and thus they've learned that (4) is no good. 

The simplest version of this theory won't work, though. Since children (and you) have presumably never heard any of the sentences below (where Gazeindenfrump and Bleizendorf are people's names, the dax is an object, and a dacha is a kind of house used in Russia):

(5) Gazeidenfrump sent the dax to Bleizendorf.
(6) Gazeidenfrump sent Bleizendorf the dax.
(7) Gazeidenfrump sent the dax to the dacha.
(8) *Gazeidenfrump sent the dacha the dax. 

Since we've heard (and expected) sentence #8 just as many times as we heard/expected (5-7), failures of predictions can't explain why we know (8) is bad but (5-7) isn't. (BTW If you don't like my examples, there are many, many more in the literature; these are the best I can think of off the top of my head.)

So we can't be tracking just the words themselves, but something more abstract. Pinker has an extended discussion of this problem in his 1989 book, in which he argues that the constraint is semantic: we know that you can use the double-object construction (e.g., 2, 4, 6 or 8) only if the recipient of the object can actually possess the object (that is, the dax becomes Bleizendorf's, but it doesn't become the dacha's, since dachas -- and borders -- can't own things). I'm working off of memory now, but I think -- but won't swear -- that Pinker's solution also involves some aspects of the syntactic/semantic structures above being innate.

Pinker's account is not perfect and may end up being wrong in some places, but it remains the fact that negative evidence (implicit or not) can't alone explain where children (and adults) do or do not generalize.


[1] Melodye quotes Pinker saying "The implications of the lack of negative evidence for children's overgeneralization are central to any discussion of learning, nativist or empiricist." That is the quote that she says is "quite frankly, ridiculous." Here is the full quote. I'll let you decide whether it's ridiculous:
This nature–nurture dichotomy is also behind MacWhinney’s mistaken claim that the absence of negative evidence in language acquisition can be tied to Chomsky, nativism, or poverty-of-the-stimulus arguments. Chomsky (1965, p. 32) assumed that the child’s input ‘consist[s] of signals classified as sentences and nonsentences _’ – in other words, negative evidence. He also invokes indirect negative evidence (Chomsky, 1981). And he has never appealed to Gold’s theorems to support his claims about the innateness of language. In fact it was a staunch ANTI-nativist, Martin Braine (1971), who first noticed the lack of negative evidence in language acquisition, and another empiricist, Melissa Bowerman (1983, 1988), who repeatedly emphasized it. The implications of the lack of negative evidence for children’s overgeneralization are central to any discussion of learning, nativist or empiricist.

PINKER, S. (2004). Clarifying the logical problem of language acquisition Journal of Child Language, 31 (4), 949-953 DOI: 10.1017/S0305000904006439

photo: Big Fat Rat

Intelligent Nihilism

The latest issue of Cognitive Science, which is rapidly becoming one of my favorite journals, carries an interesting and informative debate on the nature of language, thought, cognition and learning, between John Hummel at University of Illinois-Urbana-Champaign, and Michael Ramscar, at Stanford University. This exchange of papers highlights what I think is the current empirical standstill between two very different world-views.

Hummel takes up the cause of "traditional" models on which thought and language is deeply symbolic and involves algebraic rules. Ramscar defends more "recent" alternative models that are built on associate learning -- essentially, an update on the story that was traditional before the symbolic models.

Limitations of Relational Systems

The key to Hummel's argument, I think, is his focus on explicitly relational systems:
John can love Mary, or be taller than Mary, or be the father of Mary, or all of the above. The vocabulary of relations in a symbol system is open-ended ... and relations can take other relations as arguments (e.g., Mary knows John loves her). More importantly, not only can John love Mary, but Sally can love Mary, too, and in both cases it is the very same "love" relation ... The Mary that John loves can be the very same Mary that is loved by Sally. This capacity for dynamic recombination is at the heart of a symbolic representation and is not enjoyed by nonsymbolic representations.
That is, language has many predicates (e.g., verbs) that seem to allow arbitrary arguments. So talking about the meaning of love is really talking about the meaning of X loves Y: X has a particular type of emotional attachment to Y. You're allowed to fill in "X" and "Y" more or less how you want, which is what makes them symbols.

Hummel argues that language is even more symbolic than that: not only do we need symbols to refer to arguments (John, Mary, Sally), but we also need symbols to refer to predicates as well. We can talk about love, which is itself a relation between two arguments. Similarly, we can talk about friendship, which is an abstract relation. This is a little slippery if you're new to the study of logic, but doing this requires a second-order logic, which has a number of formal properties.

Where Hummel wants to go with this is that associationist theories, like Ramscar's, can't represent second-order logical systems (and probably aren't even up to the task of the types of first-order systems we might want). Intuitively, this is because associationist theories represent similarities between objects (or at least how often both occur together), and it's not clear how they would represent dissimilarities, much less represent the concept of dissimilarity:
John can be taller than Mary, a beer bottle taller than a beer can, and an apartment building is taller than a house. But in what sense, other than being taller than something, is John like a beer bottle or an apartment building? Making matters worse, Mary is taller than the beer bottle and the house is taller than John. Precisely because of their promiscuity, relational concepts defy learning in terms of simple associative co-occurrences.
It's not clear in these quotes, but there's a lot of math to back this stuff up: second-order logic systems are extremely powerful and can do lots of useful stuff. Less powerful computational systems simply can't do as much.

The Response

Ramscar's response is not so much to deny the mathematical truths Hummel is proposing. Yes, associationist models can't capture all that symbolic systems can do, but language is not a symbolic system:
We think that mapping natural language expressions onto the promiscuous relations Hummel describes is harder than her does. Far harder: We think you cannot do it.
Ramscar identifies a couple old problems: one is polysemy, the fact that words have multiple meanings  (John can both love Mary and love a good argument, but probably not in the same way). Fair enough -- nobody has a fully working explanation of polysemy.

The other problem is the way in which the symbols themselves are defined. You might define DOG in terms of ANIMAL, PET, FOUR-LEGGED, etc. Then those symbols also have to be defined in terms of other symbols (e.g., FOUR-LEGGED has to be defined in terms of FOUR and LEG). Ramscar calls this the turtles-all-the-way-down argument.

This is fair in the sense that nobody has fully worked out a symbolic system that explains all of language and thought. It's unfair in that he doesn't have all the details of this theory worked out, either, and his model is every bit as turtles-all-the-way-down. Specifically, concepts are defined in terms of cooccurrences of features (a dog is a unique pattern of co-occurring tails, canine teeth, etc.). Either those features are themselves symbols, or they are always patterns of co-occuring features (tail = co-occurrence of fur, flexibility, cylindrical shape, etc.), which are themselves patterns of other other feature co-occurrences, etc. (It's also unfair in that he's criticizing a very old symbolic theory; there are newer, possibly better ones around, too.)

Implicit in his argument is the following: anything that symbolic systems can do that associationist systems can't do are things that humans can't do either. He doesn't address this directly, but presumably this means that we don't represent abstract concepts such as taller than or friendship, or, if we do, it's via a method very different from formal logic (what that would be is left unspecified).

It's A Matter of Style

Here's what I think is going on: symbolic computational systems are extremely powerful and can do lots of fancy things (like second-order logics). If human brains instantiate symbolic systems, that would explain very nicely lots of the fancy things we can do. However, we don't really have any sense of how neurons could instantiate symbols, or even if it's possible. So if you believe in symbolic computation, you're basically betting that neurons can do more than it seems.

Associationist systems face the opposite problem: we know a lot about associative learning in neurons, so this seems like an architecture that could be instantiated in the brain. The problem is that associative learning is an extremely underpowered learning system. So if you like associationist systems, you're betting that humans can't actually do many of the things (some of) us think humans can do.

Over at Child's Play, Dye claimed that the argument in favor of Universal Grammar was a form of Intelligent Design: we don't know how that could be learned/evolve, so it must be innate/created. I'll return the favor by labeling Ramscar's argument Intelligent Nihilism: we don't how the brain could give rise to a particular type of behavior, so humans must not be capable of it.

The point I want to make is we don't have the data to choose between these options. You do have to work within a framework if you want to do research, though, and so you pick the framework that strikes you as most plausible. Personally, I like symbolic systems.

John E. Hummel (2010). Symbolic versus associative learning Cognitive Science, 34, 958-865

Michael Ramscar (2010). Computing machinery and understanding Cognitive Science, 34, 966-971

photos: Anirudh Koul (jumping), wwarby (turtles), kaptain kobold (Darwin)

Wait -- Jonah Lehrer Wants Reading to be Harder?

Recently Jonah Lehrer, now at Wired, wrote a ode to books, titled The Future of Reading. Many people are sad to see the slow replacement of physical books by e-readers -- though probably not many people who have lugged 50 pounds of books in a backpack across Siberia, though that's a different story. The take-home message appears 2/3 of the way down:
So here’s my wish for e-readers. I’d love them to include a feature that allows us to undo their ease, to make the act of reading just a little bit more difficult. Perhaps we need to alter the fonts, or reduce the contrast, or invert the monochrome color scheme. Our eyes will need to struggle, and we’ll certainly read slower, but that’s the point: Only then will we process the text a little less unconsciously, with less reliance on the ventral pathway. We won’t just scan the words – we will contemplate their meaning.
As someone whose to-read list grows several times faster than I actually do any reading, I've never wished to read more slowly. But Lehrer is a science writer, and (he thinks) there's more to this argument than just aesthetics. As far as I can tell, though, it's based on a profound misunderstanding of the science. Since he manages to get through the entire post without ever citing a specific experiment, it's hard to tell for sure, but here's what I've managed to piece together. 

Reading Research

Here's Lehrer:
Let me explain. Stanislas Dehaene, a neuroscientist at the College de France in Paris, has helped illuminate the neural anatomy of reading. It turns out that the literate brain contains two distinct pathways for making sense of words, which are activated in different contexts. One pathway is known as the ventral route, and it’s direct and efficient, accounting for the vast majority of our reading. The process goes like this: We see a group of letters, convert those letters into a word, and then directly grasp the word’s semantic meaning. According to Dehaene, this ventral pathway is turned on by “routinized, familiar passages” of prose, and relies on a bit of cortex known as visual word form area (VWFA).

So far, so good. Dehaene is a brilliant researcher who has had an enormous effect on several areas of cognition (I'm more familiar with his work on number). I'm a bit out-of-date on reading research (and remember Lehrer doesn't actually cite anything to back up his argument), but this looks like an updated version of the old distinction between whole-word reading and real-time composition. That is, it goes without saying that you must "sound out" novel words that you've never encountered before, such as gafrumpenznout. However, it seems that as you become more familiar with a particular word (maybe Gafrumpenznout is your last name), you can recognize the word quickly without sounding it out.

Here's the abstract from a relevant 2008 Dehaene group paper:
Fast, parallel word recognition, in expert readers, relies on sectors of the left ventral occipito-temporal pathway collectively known as the visual word form area. This expertise is thought to arise from perceptual learning mechanisms that extract informative features from the input strings. The perceptual expertise hypothesis leads to two predictions: (1) parallel word recognition, based on the ventral visual system, should be limited to words displayed in a familiar format (foveal horizontal words with normally spaced letters); (2) words displayed in formats outside this field of expertise should be read serially, under supervision of dorsal parietal attention systems. We presented adult readers with words that were progressively degraded in three different ways (word rotation, letter spacing, and displacement to the visual periphery).
When the words are degraded in these various ways, participants had a harder time reading and recruited different parts of the brain. A (slightly) more general public-friendly version of this story appears in this earlier paper. This appears to be the paper that Lehrer is referring to, since he says that Dehaene, in experiments, activates the dorsal pathways "in a variety of ways, such as rotating the letters or filling the prose with errant punctuation."

And the Vision Science Behind It

This work makes a lot of sense, given what we know about vision. Visual objects -- such as letters -- "crowd" each other. In other words, when there are several that are close together, it's hard to see any of them. This effect is worse in peripheral vision. Therefore, to see all the letters in a long-ish word, you may need to fixate on multiple parts of the word.

However, orthography is heavily redundant. One good demonstration of this is rmvng ll th vwls frm sntnc. You can still read with some of the letters missing (and of course some languages, like Hebrew, never print vowels). Moreover, sentence context can help you guess what a particular word is. So if you're reading a familiar word in a familiar context, you may not need to see all the letters well in order to identify it. The less certain you are of what the word is, the more carefully you'll have to look at it.

The Error

So far, this research appears to be about visual identification of familiar objects. Lehrer makes a big leap, though:

When you are a reading a straightforward sentence, or a paragraph full of tropes and cliches, you’re almost certainly relying on this ventral neural highway. As a result, the act of reading seems effortless and easy. We don’t have to think about the words on the page ...  Deheane’s research demonstrates that even fluent adults are still forced to occasionally make sense of texts. We’re suddenly conscious of the words on the page; the automatic act has lost its automaticity.
This suggests that the act of reading observes a gradient of awareness. Familiar sentences printed in Helvetica and rendered on lucid e-ink screens are read quickly and effortlessly. Meanwhile, unusual sentences with complex clauses and smudged ink tend to require more conscious effort, which leads to more activation in the dorsal pathway. All the extra work – the slight cognitive frisson of having to decipher the words – wakes us up.
It's based on this that he argues that e-readers should make it harder to read, because then we'd pay more attention to what we're reading. The problem is that he seems to have confused the effort expended in recognizing the visual form of a word -- the focus of Dehaene's work -- with effort expended in interpreting the meaning of the sentence. Moreover, he seems to think that the harder it is to understand something, the more we'll understand it -- which seems backwards to me. Now it is true that the more deeply we process something the better we remember it, but it's not clear that making something hard to see necessarily means we process it more deeply. In any case, we'd want some evidence that this is so, which Lehrer doesn't cite.

Which brings me back to citation. Dehaene did just publish a book on reading, which I haven't read because it's (a) long, and (b) not available on the Internet. Maybe Dehaene makes the claim that Lehrer is attributing to him in that book. Maybe there's even evidence to back that claim up. As far as I can tell, that work wasn't done by Dehaene (as Lehrer implies) since I can't find it on Dehaene's website. Though maybe it's there under a non-obvious title (Dehaene publishes a lot!). This would be solved if Lehrer would cite his sources.


I like Lehrer's writing, and I've enjoyed the few interactions I've had with him. I think occasional (frequent?) confusion is a necessary hazard of being a science writer. I have only a very small number of topics I feel I understand well enough to write about them competently. Lehrer, by profession, must write about a very wide range of topics, and it's not humanly possible to understand many of them very well.

DEHAENE, S., COHEN, L., SIGMAN, M., & VINCKIER, F. (2005). The neural code for written words: a proposal Trends in Cognitive Sciences, 9 (7), 335-341 DOI: 10.1016/j.tics.2005.05.004

Cohen L, Dehaene S, Vinckier F, Jobert A, & Montavont A (2008). Reading normal and degraded words: contribution of the dorsal and ventral visual pathways. NeuroImage, 40 (1), 353-66 PMID: 18182174

Photos: margolove, kms !

Games and Words

I diligently tag posts on this blog, not because I actually think anybody clicks in the cloud to find specific types of posts, but because it's interesting to see, over time, what I usually write about.

There's another way of doing this. will allow you to input the feed for a blog, and it will extract the most common words from the last number of posts.

I'm gratified to see that the most common word in this blog is "data," followed by "studies," "participants" and "blog". The high frequency of "blog" and the URL for this site are a byproduct of my ATOM feed, which lists the URL of the blog after every post.

Unfortunately, the restriction of to the most recent posts means that some words are over-weighted. For instance, my recent post about shilling for products mentioned the word "product" enough times to make that word prominent in this word cloud.

Thank you, Oprah

Oprah's magazine linked to my collaborator's web-based lab. I'm a little miffed at the lack of the link love, but I still got something out of it -- we now have over 20,000 participants in the experiment we've been running on her site. So thank you, Oprah.

Busy analyzing...

Sorry, Sharing My Data is Illegal

I recently got back from collecting data in Russia. This particular study brought into focus for me the issues involved in making experimental data public. In this study, I videotape people as they listen to stories, look at pictures, and answer questions about the stories. The videotape is key, since what I'm actually analyzing is the participants' eye-gaze direction during different parts of the stories (this can be used to partially determine what the participants were thinking at different points in time).

Sharing raw data would mean sharing the videos...which I can't do. These videos are confidential, and there's no easy way of making them anonymous, since they are close-up videos of people's faces. I could ask participants to sign a waver allowing me to put up their videos on the Internet, but I suspect most of my participants would just refuse to participate. Many were concerned enough about the video as was.

Now, I could share the semi-processed data -- that is, not the videos themselves but the information gleaned from them. I already discussed some of the problems with that, namely that getting the data into a format that's easy for someone else to analyze is extremely time-consuming.

This isn't an issue with just one study -- more than half the studies I run are eye-tracking studies. Many of the rest are EEG studies, which can have several gigabytes of data each and thus it's simply impractical to share the data (plus, when dealing with brain data anonymity is even more a concern). I do some kid studies where I simply write down participants' responses, but if your goal was the check to make sure I'm recording my data correctly, that wouldn't help -- what you'd want are tapes of the experiments, but good luck convincing the ethics board to allow me to post videos of young children participants in experiments on the Internet.

[Those are my laboratory studies. Data from my Web-based studies is actually relatively easy to share -- though you'd have to be proficient in ActionScript to understand it.]

Certainly, there are many behavioral researchers that wouldn't have this problem. But there are many who would. Mandating that everyone make their data publicly available would mean that many kinds of experiments simply couldn't be done anymore.


I recently received the following invitation:

Hi my name is [redacted] and I’m a blog spotter. I basically scour popular blogs in an effort to find great writers. I loved your post on Science, Grime and Republicans, nice job!
I’d like to get straight to the point.
Our client wants people like you to sponsor their products and will pay you to do so. They’ve launched an educational product on September 7th that teaches others how to make money on the internet by using Facebook and Social Media.
We want to pay you for recommending that product to your loyal blog readers and we will pay you up to $200 for each person that you refer. If you make just one sale a day you’re looking at making around $6000 per month.
All you need to do is create a few blog posts that recommend this product. You may also use one of our nice banners and place it on your blog.
Rumor would suggest that a fair number of bloggers do strike such bargains (no idea about the proportion). So just in case any of my loyal readers are wondering, I will never take money to recommend a product. If I recommend a product, it's because I like it.

Help Games with Words get a job!

As job application season comes around, I'm trying to move some work over from the "in prep" and "under revision" columns to the "submitted" column (which is why I'm working on a Sunday). There is one old project that's just waiting for more data before resubmission. I've already put up calls here for readers to participate, so you've probably participated. But if anyone is willing to pass on this call for participation to their friends, it would be much appreciated. I personally think this is the most entertaining study I've run online, but for whatever reason it's never attracted the same amount of traffic as the others, so progress has been slow.

You can find the experiment (The Video Test) here.

Science, Grime and Republicans

Every time I go to Russia, the first thing I notice is the air. I would say it's like sucking on a car's exhaust pipe, but -- and this is key to my story -- the air in American exhaust pipes is actually relatively fresh. You have to image black soot spewing forth from a grimy, corroded pipe. Pucker up. [That's the first thing I notice, unless I'm in St Petersburg -- In many parts of Petersburg the smell of urine overwhelms the industrial pollution. And I say this as someone who loves Petersburg.]

So whenever I read that regulations are strangling business, I think of Russia. The trash everywhere. My friends, living in a second floor apartment, complaining how the grime that comes in through the window (they can't afford airconditioning) turns everything in the apartment grey. Gulping down breaths of sandpaper. The hell-hole that oil extraction has made of Sakhalin. Seriously, I don't know why more post-apocalyptic movies aren't shot in Sakhalin. Neither words nor pictures can describe the remnants of clear-cut, burnt-over forest -- looking at it, not knowing how long it's been like that, since such forests (I'm told) will almost certainly never grow back. It's something everybody should see once.

At least Russia has a great economy, thanks to deregulation. Or not. New Russians, of course, live quite well, but most people I know (college-educated middle class) are, by American standards, dirt poor. And even New Russians have to breath that shitty, shitty air.


Listening to people complain that environmental regulation is too costly and largely without value, you'd be forgiven for thinking such places didn't exist. You might believe that places without environmental regulations are healthy, wealthy and wise, rather than, for the most part, impoverished and with lousy air and water.

This is the problem with the modern conservative movement in the US, and why I'm writing this post in a science blog. Some time ago, conservatives had a number of ideas that seemed plausible. It turns out, many of them were completely wrong. The brightest of the bunch abandoned these thoroughly-discredited ideas and moved on to new ones. Others, forced to choose between reality and their priors, chose the priors.

The most famous articulation of this position comes from an anonymous Bush aid, quoted by Ron Suskind:
The aide said that guys like me were "in what we call the reality-based community," which he defined as people who "believe that solutions emerge from your judicious study of discernible reality." ... "That's not the way the world really works anymore," he continued. "We're an empire now, and when we act, we create our own reality. And while you're studying that reality—judiciously, as you will—we'll act again, creating other new realities, which you can study too, and that's how things will sort out. We're history's actors…and you, all of you, will be left to just study what we do."
Even More Reality

It doesn't stop there. Discretionary government spending, one hears, is the cause of our deficits, despite the fact that the deficit is larger than all discretionary government spending. Tax breaks for the rich stimulate the economy, whereas infrastructure improvements are useless. Paul Krugman's blog is one long chronicle of absurd economic fantasy coming from the Right.

Gay marriage harms traditional marriage -- despite the fact that places where gay marriage and civil unions exist (e.g., New England) tend to have lower divorce rates and lower out-of-wedlock birth rates.

European-style medicine is to be avoided at all costs, despite the fact that the European medical system costs less and delivers better results than the American system.

Global warming. Evolution. And so on.

A Strong Opposition

I actually strongly believe in the value of a vibrant, healthy opposition. In my work, I prefer collaborators with whom I don't agree, on the belief that this tension ultimately leads to better work. Group-think is a real concern. There may be actual reasons to avoid a particular environmental regulation, European-style health care, a larger stimulus bill, etc. -- but to the extent that those reasons are based on empirical claims, the claims should actually be right. You don't get to just invent facts.

So in theory, I could vote for a good Republican. But even if there were to be one running for office now -- and I don't think there are any -- they'd still caucus with the self-destructive, nutters that make up most of the modern party.

This is not to say Democrats have no empirical blind spots (they seem to be just as likely to believe that nonsense about vaccines and Autism, for instance), but on the whole, Democrats believe in reality. More to the point, most (top) scientists and researchers are Democrats, which has to influence the party (no data here, but I have yet to meet a Republican scientist, so they can't be that common).

So if you believe in reality, if you believe in doing what works rather than what doesn't, if you care at all about the future of our country, and if you are eligible to vote in the US elections this Fall, vote for the Democrat (or Left-leaning independent, etc., if there's one with a viable chance of winning).

Slate's Report on Hauser Borders on Fraud

Love, turned sour, is every bit as fierce. I haven't written about the Hauser saga for a number of reasons. I know and like the guy, and I find nothing but sadness in the whole situation. Nonetheless, I've of course been following the reports, and I wondered why my once-favorite magazine had so long been silent.

Enjoying my fastest Wi-Fi connection in weeks here at the Heathrow Yotel, I finally found Slate's take on scandal, subtitled What went wrong with Marc Hauser's search for moral foundations. The article has a nice historical overview of Hauser's work, in context, and neatly describes several experiments. The article is cagey, but you could be excused for believing that (a) Hauser has done a lot of moral cognition research with monkeys, and (b) that work was fraudulent. The only problem is that nobody, to my knowledge, has called Hauser's moral cognition research into question -- in fact, most people have gone out of their way to say that that work (done nearly exclusively with humans) replicates very nicely. There was some concern about some work on intention-understanding in monkeys, which is probably a prerequisite for some types of moral cognition, but that's not the work one thinks of when talking about Hauser's Moral Grammar hypothesis.

I can't tell if this was deliberately misleading or just bad reporting, and I'm not sure which is more disturbing.

Slate's science reporting has always been weak (see here, here, here, and especially here), and the entire magazine has been on a steady decline for several years. Sigh. I need a new magazine.

When is the logically impossible possible?

Child's Play has posted the latest in a series of provoking posts on language learning. There's much to recommend the post, and it's one of the better defenses of statistical approaches to language learning around on the Net. It would benefit from some corrections, though, and into the gap I humbly step...

The post sets up a classic dichotomy:
Does language “emerge” full-blown in children, guided by a hierarchy of inbuilt grammatical rules for sentence formation and comprehension? Or is language better described as a learned system of conventions — one that is grounded in statistical regularities that give the appearance of a rule-like architecture, but which belie a far more nuanced and intricate structure?
It's probably obvious from the wording which one they favor. It's also less obviously a false dichotomy. There probably was a very strong version of Nativism that at one point looked like their description of Option #1, but very little Nativist theory I've read from the last few decades looks anything like that. Syntactic Bootstrapping and Syntactic Bootstrapping are both much more nuanced (and interesting) theories.

Some Cheek!

Here's where the post gets cheeky: 

For over half a century now, many scientists have believed that the second of these possibilities is a non starter. Why? No one’s quite sure — but it might be because Chomsky told them it was impossible.
Wow? You mean nobody really thought it through? That seems to be what Child's Play thinks, but it's a misrepresentation of history. There are a lot of very good reasons to favor Nativist positions (that is, ones with a great deal of built-in structure). As Child's Play discuss -- to their credit -- any language admits an infinite number of grammatical sentences, so any finite grammar will fail (they treat this as a straw-man argument, but I think historically that was once a serious theory). There are a number of other deep learning problems that face Empiricist theories (Pinker has an excellent paper on the subject from around 1980). There are deep regularities across languages -- such as linking rules -- that are crazy coincidences or reflect innate structure. 

The big one, from my standpoint, is that any reasonable theory of language is going to have to have, in the adult state, a great deal of structure. That is, one wants to know why "John threw the ball AT Sally" means something different from "John threw the ball TO Sally." Or why "John gave Mary the book" and "John gave the book to Mary" mean subtly different things (if you don't see that, try substituting "the border" with "Mary"). A great deal of meaning is tied up in structure, and representing structure as statistical co-occurrences doesn't obviously do the job. 

Unlike Child's Play, I'm not going to discount any possibility of the opposing theories to get the job done (though I'm pretty sure they can't). I'm simply pointing out that Nativism didn't emerge from a sustained period of collective mental alienation.

Logically Inconsistent

Here we get to the real impetus for this response, which is this extremely odd section towards the end:
We only get to this absurdist conclusion because Miller & Chomsky’s argument mistakes philosophical logic for science (which is, of course, exactly what intelligent design does).  So what’s the difference between philosophical logic and science? Here’s the answer, in Einstein’s words, “No amount of experimentation can ever prove me right; a single experiment can prove me wrong.”
In context, this means something like "Just because our theories have been shown to be logically impossible doesn't mean they are impossible." I've seen similar arguments before, and all I can say each time is:


That is, they clearly understand logic quite differently from me. If something is logically impossible, it is impossible. 2 + 2 = 100 is logically impossible, and no amount of experimenting is going to prove otherwise. The only way a logical proof can be wrong is if (a) your assumptions were wrong, or (b) your reasoning was faulty. For instance, the above math problem is actually correct if the answer is written in base 2. 

In general, one usually runs across this type of argument when there is a logical argument against a researcher's pet theory, and said researcher can't find a flaw with the argument. They simply say, "I'm taking a logic holiday." I'd understand saying, "I'm not sure what the flaw in this argument is, though I'm pretty sure there is one." It wouldn't be convincing (or worth publishing), but I can see that. Simply saying, "I've decided not to believe in logic because I don't like what it's telling me" is quite another thing.