Games with Words: On old debates

Showing posts with label On old debates. Show all posts

Evolutionary Psychology, Proximate Causation, & Ultimate Causation

Posted by GamesWithWords on Monday, May 06, 2013

Evolutionary psychology has always been somewhat controversial in the media for reasons that generally confuse me (Wikipedia has a nice rundown of the usual complaints). For instance, the good folks at Slate are particularly hostile (here, here and here), which is odd because they are also generally hostile towards Creationism (here, here and here).

Given the overwhelming evidence that nearly every aspect of the human mind and behavior is at least partly heritable (and so at least partially determined by our genes), the only way to deny the claim that our minds are at least partially a product of evolution is to deny that evolution affects our genes – that is, deny the basic tenants of evolutionary theory. (I suppose you could try to deny the evidence of genetic influence on mind and behavior, but that would require turning a blind eye to such a wealth of data as to make Global Warming Denialism seem like a warm-up activity).

What's the matter with Evolutionary Psychology?

What is there to object to, anyway? Some of the problem seems definitional. Super-Science-Blogger Greg Laden acknowledges that applying evolutionary theory to the study of the human mind is a good idea, but that "evolutionary psychology" refers only to a very specific theory from Cosmides and Tooby, one with which he takes issue. And in general, a lot of the "critiques" I see in the media seem to involve equating the entire field with some specific hypothesis or set of hypotheses, particularly the more exotic ones.

For instance, some years back Slate ran an article about "Evolutionary Psychology's Anti-Semite", a discussion of Kevin MacDonald, who has an idiosyncratic notion of Judaism as a "group evolution strategy" to maximize, through eugenics, intelligence (the article goes into some detail). It's a pretty nutty idea, gets basic historical facts wrong, and more importantly gets the science wrong. The article tries pretty hard to paint him as a mainstream Evolutionary Psychologist nonetheless. Interviewees aren't that helpful (they mostly dismiss the work as contradicting basic fundamentals of evolutionary theory), but the article author pulls up other evidence, like the fact that MacDonald acknowledged some mainstream researchers in one of his books. (For the record, I acknowledge Benicio del Toro as an inspiration, so you know he fully agrees with everything in this blog post. Oh, and Jenna-Louise Coleman, too.)

In a similar vein:

This spring, New York Times columnist John Tierney asserted that men must be innately more competitive than women since they monopolize the trophies in -- hold onto your vowels -- world Scrabble competitions. To bolster his case, Tierney turned to evolutionary psychology. In the distant past, he argued, a no-holds-barred desire to win would have been an adaptive advantage for many men, allowing them to get more girls, have more kids, and pass on their competitive genes to today's word-memorizing, vowel-hoarding Scrabble champs.

I will agree that this argument involves a bit of a stretch and is awfully hard to falsify (as the article goes on to point out). And sure, some claims made even by serious evolutionary psychologists are hard to falsify with current technology ... but then so is String Theory. And we do have many methods for testing evolutionary theory in general, and roughly the same ones work whether you are studying the mind and behavior or purely physical attributes of organisms. So, again, if you want to deny that claims about evolutionary psychology are testable, then you end up having to make roughly the same claim about evolutionary theory in general.

Just common sense

It turns out that when you look at the biology, a good waist-hips ratio for a healthy woman is (roughly) .7, whereas the ideal for men is closer to .9. Now imagine we have a species of early hominids (Group A) that is genetically predispositioned such as that heterosexual men prefer women with a waist-hips ratio of .7 and heterosexual women prefer men with a waist-hips ratio of .9. Now let's say we have another species of early hominids (Group B) where the preferences are reversed, preferring men with ratios of .7 and women with ratios of .9. Since individuals of Group A prefer to mate with healthier partners than Group B does, which one do you think is going to have more surviving children?

Now compare to Group C, where there is no innate component to interest in waist-hips ratios; beauty has to be learned. Group C is still at a disadvantage to Group A, since some of the people in it will learn to prefer the wrong proportions and preferentially mate with less healthy individuals. In short, all else equal, you would expect evolution to lead to hominids that prefer to mate with hominids that have close-to-ideal proportions.

(If you don't like waist-hips ratios, consider that humans prefer individuals without deformities and gaping sores and boils, and then play the same game.)

Here is another example. Suppose that in Group A, individuals find babies cute, which leads them to want to protect and nourish the infants. In Group B, individuals find babies repulsive, and many actually have an irrational fear of babies (that is, treating babies something like how we treat spiders, snakes & slugs). Which one do you think has more children that survive to adulthood? Once again, it's better to have a love of cuteness hardwired in rather than something you have to learn from society, since all it takes is for a society to get a few crazy ideas about what cute looks like ("they look better decapitated!") and then the whole civilization is wiped out.

(If you think that babies just *are* objectively cute and that there's no psychology involved, consider this: Which do you find cuter, a human baby or a skunk baby? Which do you think a mother skunk finds cuter?)

These are the kinds of issues that mainstream evolutionary psychology trucks in. And the theory does produce new predictions. For instance, you'd expect that in species where a .7 waist-hips ratio is not ideal for females (that is, pretty much any species other than our own), it wouldn't be favored (and it isn't). And the field is generally fairly sensible, which is not to say that all the predictions are right or that evolutionary theory doesn't grow and improve over time (I understand from a recent conversation that there is now some argument about whether an instinct for third-party punishment is required for sustainable altruism, which is something I had thought was a settled matter).

Who you gonna believe: E. O. Wilson or common sense?

Posted by GamesWithWords on Wednesday, April 10, 2013

I was planning a post on E. O. Wilson's recent flight of fancy, "Great Scientist ≠ Good at Math", in which he tells potential future scientists that knowing math isn't all that important, but it turns out Jeremy Fox has already said everything I was going to say, only better. It's a long post, though, so here are some key passages:

Wilson’s claim that deep interest in a subject, combined with deep immersion in masses of data, is sufficient, because hey, it worked for Charles Darwin, is utter rubbish. First of all, just because it worked for Darwin (or Wilson) doesn’t mean it will work for you, and just because it worked in the 19th century doesn’t mean it will work in the 21st. If for no other reason than that there are plenty of people out there, in every field, who not only have a deep interest in the subject and an encyclopedic knowledge of the data, but who know a lot of mathematics and statistics.

and

Wilson claims that strong math skills are relevant only a few disciplines, like physics. Elsewhere, great science is a matter of “conjuring images and processes by intuition”... I’m sure Wilson is describing his own approach here, and it’s worked for him. But I have to say, it’s surprising to find someone as famous for his breadth of knowledge as E. O. Wilson generalizing so unthinkingly from his own example. I wonder what his late collaborator Robert MacArthur would think of the notion that intuition alone is enough. I wonder what Bill Hamilton would think. Or R. A. Fisher. Or J. B. S. Haldane. Or Robert May. Or John Maynard Smith. Or George Price. Or Peter Chesson. Or Dave Tilman. Or lots of other great ecologists and evolutionary biologists I could name off the top of my head. Would Wilson seriously argue that none of those people were great scientists, or that they never made any great discoveries, or that the great discoveries they made arose from intuition unaided by mathematics?

Meanwhile, over at Finding the Next Einstein, Jonathan Wai draws on his own research to argue that mathematics ability is key to success in a wide range of scientific fields (though these data are unfortunately correlational).

Posted by GamesWithWords on Tuesday, January 22, 2013

Chemistry has its own problems with replication, according to Nature:

Scrounging chemicals and equipment in their spare time, a team of chemistry bloggers is trying to replicated published protocols for making molecules. The researchers want to check how easy it is to repeat the recipes that scientists report in papers ... Among the frustrations [chemists] have experienced with the chemical literature ... are claims that reactions yield products in greater amounts than seem reasonable, and scanty detail about specific conditions in which to run reactions. In some cases, reactions are reported which seem to good to be true - such as a 2009 paper which was corrected within 24 hours by web-savvy chemists live-blogging the experiment.

It's hard to tell from the article how common it is for a reaction simply not to be possible at all as opposed to simply produce less product than reported. Presumably either is problematic, but the causes would be different.

Given the recent excitement about (non-)replication, one has to wonder if this problem is more or less common than in the past. While my gut instinct is that replication was probably less of a problem in the earlier, smaller days of science, it's also quite possible that it's like many forms of violent crime: extremely rare today by historical standards, but we care much more about it.

Fractionating IQ

Posted by GamesWithWords on Thursday, January 17, 2013

Near the dawn of the modern study of the mind, the great psychological pioneer Charles Spearman noticed that people who are good at one kind of mental activity tend to be good at most other good mental activities. Thus, the notion of g (for "general intelligence") was born: the notion that there is some underlying factor that determines -- all else equal -- how good someone is at any particular intelligent task. This of course fits folk psychology quite well: g is just another word for "smarts".

The whole idea has always been controversial, and many people have argued that there is more than one kind of smarts out there (verbal vs. numeric, logical vs. creative, etc.). Enter a recent paper by Hampshire and colleagues (Hampshire, HIghfield, Parkin & Owen, 2012) which tries to bring both neuroimaging and large-scale Web-based testing to bear on the question.

In the neuroimaging component, they asked sixteen participants to carry out twelve difficult cognitive tasks while their brains were scanned and applied principle components analysis (PCA) to the results. PCA is a sophisticated statistical method for grouping things.

A side note on PCA

If you already know what PCA is, skip to the next section. Basically, PCA is a very sophisticated way of sorting thigns. Imagine you are sorting dogs. The simplest thing you could do is have a list of dog breeds and go through each dog and sort it according to its breed.

What if you didn't already have dog breed manual? Well, German shepherds are more similar to one another than any given German shepherd is to a poodle. So by looking through the range of dogs you see, you could probably find a reasonable way of sorting them, "rediscovering" the various dog breeds in the process. (In more difficult cases, there are algorithms you could use to help out.)

That works great if you have purebreds. What if you have mutts? This is where PCA comes in. PCA assumes that there are some number of breeds and that each dog you see is a mixture of those breeds. So a given dog may be 25% German Shepherd, 25% border collie, and 50% poodle. PCA tries to "learn" how many breeds there are, the characteristics of those breeds, and the mixture of breeds that makes up each dog -- all at the same time. It's a very powerful technique (though not without its flaws).

Neuroimaging intelligence

Analysis focused only on the "multiple demands" network previously identified as being related to IQ and shown in red in part A of the graph below. PCA discovered two underlying components that accounted for about 90% of the variance in the brain scans across the twelve tasks. One was particularly important for working memory tasks, so the authors called in MDwm (see part B of the graph below), and it involved mostly the IFO, SFS and ventral ACC/preSMA (see part A below for locations). The other was mostly involved in various reasoning tasks and involved more IFS, IPC and dorsal ACC/preSMA.

Notice that all tasks involved both factors, and some tasks (like the paired associates memory task) involved a roughly equal portion of each.

Sixteen subjects isn't very many

The authors put versions of those same twelve tasks on the Internet. They were able to get data from 44,600 people, which makes it one of the larger Internet studies I've seen. The authors then applied PCA to those data. This time they got three components, two of which were quite similar to the two components found in the neuroimaging study (they correlated at around r=.7, which is a very strong correlation in psychology). The third component seemed to be particularly involved in tasks requiring language. Most likely that did not show up in the neuroimaging study because the neuroimaging study focused on the "multiple demands" network, whereas language primarily involves other parts of the brain.

The factors dissociated in other ways as well. Whereas people's working memory and reasoning abilities start to decline about the time people reach the legal drinking age in the US (coincidence?) verbal skills remain largely undiminished until around age 50. People who suffer from anxiety had lower than average working memory abilities, but average reasoning and verbal abilities. Several other demographic factors similarly had differing effects on working memory, reasoning, and verbal abilities.

Conclusions

The data in this paper are very pretty, and it was a particularly nice demonstration of converging behavioral and neuropsychological methods. I am curious what the impact will be. The authors are clearly arguing against a view on which there is some unitary notion of IQ/g. It occurred to me as I wrote this what while I've read many papers lately discussing the different components of IQ, I haven't read anything recent that endorses the idea of a unitary g. I wonder if there is anyone, and, if so, how they account for this kind of data. If I come across anything, I will post it here.

------

Hampshire, A., Highfield, R., Parkin, B., & Owen, A. (2012). Fractionating Human Intelligence Neuron, 76 (6), 1225-1237 DOI: 10.1016/j.neuron.2012.06.022

Revision, Revision, Revision

Posted by GamesWithWords on Tuesday, August 28, 2012

I have finally been going through the papers in the Frontiers Special Topic on publication and peer review in which my paper on replication came out. One of the arguments that appears in many of these papers (like this one)* -- and many discussions of the review process, is that when papers are published, they should be published along with the reviews.

My experience with the process -- which I admit is limited -- is that you submit a paper, reviewers raise concerns, and you only get published if you can revise the manuscript so as to address those concerns (which may include new analyses or even new experiments). At that stage, the reviews are a historical document, commenting on a paper that no longer exists. This may be useful to historians of science, but I don't understand how it helps the scientific process (other than, I suppose, transparency is a good thing).

So these proposals only make sense to me if it is assumed that papers are *not* typically revised in any meaningful way based on review. That is, reviews are more like book reviews: comments on a finished product. Of my own published work, three papers were accepted more-or-less as is (and frankly I think the papers would have benefited from more substantial feedback from the reviewers). So there, the reviews are at least referring to a manuscript very similar to the one that appeared in print (though they did ask me to clarify a few things in the text, which I did).

Other papers went through more substantial revision. One remained pretty similar in content, though we added a whole slew of confirmatory analyses that were requested by reviewers. The most recent paper actually changed substantially, and in many ways is a different -- and much better! -- paper than what we originally submitted. Of the three papers currently under review, two of them have new experiments based on reviewer comments, and the other one has an entirely new introduction and general discussion (the reviewers convinced me to re-think what I thought the paper was about). So the reviews would help you figure out which aspects of the paper we (the authors) thought of on our own and which are based on reviewer comments, but even then that's not quite right, since I usually get comments from a number of colleagues before I make the first submission. There are of course reviews from the second round, but that's often just from one or two of the original reviewers, and mostly focuses on whether we addressed their original concerns or not.

So that's my experience, but perhaps my experience is unusual. I've posted a poll (look top right). Let me know what your experience is. Since this may vary by field, feel free to include comments to this post, saying what field you are in.

---
*To be fair, this author is describing a process that has actually been implemented for a couple Economics journals, so apparently it works to (at least some) people's satisfaction.

What the Best College Teachers Do: A Review of a Vexing Book

Posted by GamesWithWords on Tuesday, April 19, 2011

What the Best College Teachers Do is not a bad book. It is engaging and reasonably well-written. The topic is both evergreen and timely, and certainly of interest to college teachers at the very least (as well as to people who rate college quality and to people who use those ratings to decide where to go to school). My issue with this book is that it is incapable of answering the question it sets out for itself.

A problem of comparison

The book is based primarily on extensive research by the author, Ken Bain, and his colleagues. The appendix spells out in detail how they identified good college teachers (a combination of student evaluations, examples of student work, department examinations, etc.) and how they collected information about those gifted individuals (interviews, taped class sessions, course materials, etc.). They analyzed these data to determine what these best college teachers did.

Even assuming that (a) their methods successfully identified superior teachers, and (b) they collected the right information about those teachers' practices, this is only half of a study. Without even looking at their data, I can easily rattle off some things all these teachers had in common:

1. They were all human beings.
2. They were all taller than 17 inches.
3. They all spoke English, at least to some degree (the study was conducted in the USA).
4. Most were either male or female.

Commonalities are not limited to attributes of the teachers, but also to what they do in the classroom:

5. Most showed up to at least half of the class periods for a given course.

6. None of them habitually sat, silent and unmoving, at the front of the classroom for the duration of class.
7. They did not assign arbitrary grades to their students (e.g., by rolling dice).
8. Very few spoke entirely in blank verse.

While these statements are almost certainly true of good college teachers, they do not distinguish the good teachers from the bad ones. Since Bain and colleagues did not include a comparison group of bad teachers, we cannot know if their findings distinguish the good teachers from the bad ones.

Science -- like teaching -- requires training

A good test of teaching ability should pick out all the good teachers. It should also pick out only the good teachers. (A somewhat different cut of the issues is to consider test reliability and test validity). What the Best College Teachers Do focuses entirely on the first issue. As my reductio ad absurdum above shows, having only half of a good test is not a test that is 50% right; it's a useless test.

It's unfortunate that Bain and his colleagues failed in this basic and fundamental aspect of scientific inquiry. Although Bain is now the director of the Center for Teaching Excellence at New York University, he was trained as a historian. This comes out in the discussion of the study methods: "Like any good historians who might employ oral history research techniques, we subsequently sought corroborating evidence, usually in the form of something on paper..." (p. 187).

I would hope that any good historian doing comparative work would know to include a comparison group, but designing a scientific study of human behavior is hard. Even psychologists screw it up. And that's the focus of our training, whereas historians are mostly learning things other than experimental design (I assume).

Circular Definitions

Of course, failing to include a control group is not the only way to ruin a study.You can also make it circular.

Chapter 3 focuses on how excellent teachers prepare for their courses:

At the core of most professors' ideas about teaching is a focus on what the teacher does rather than on what the students are supposed to learn. In that standard conception, teaching is something that instructors do to students, usually by delivering truths about the discipline. It is what some writers call a 'transmission model.' ...

In contrast, the best educators thought of teaching as anything they might do to help and encourage students to learn. Teaching is engaging students, engineering an environment in which they learn.

Here is what the appendix says about how the teachers were chosen for inclusion in the study:

All candidates entered the study on probation until we had sufficient evidence that their approaches fostered remarkable learning. Ultimately, the judgment to include someone in the study was based on careful consideration of his or her learning objectives, success in helping students achieve those objectives, and ability to stimulate students to have highly positive attitudes toward their studies.

It seems that perhaps teachers were included as being "excellent teachers" if they focused on student learning and on motivating students. The researchers then "found" that excellent teachers focus on student learning and on motivating students.

Vagueness and Ambiguity

Or maybe not. I'm still not entirely sure what it means to -- in the first quote -- focus on "what the teacher does" than on "what the students are supposed to learn." For instance, Bain poses the following thought problem on page 52:

"How will I help students who have difficulty understanding the questions and using evidence and reason to answer them."

Is that focusing on what the teacher does or focusing on what the students are supposed to learn? How can we tell? By what metric?

My confusion here may merely mark me was one of those people expecting "a simple list of do's and don'ts" who are "greatly disappointed." Bain adds (p. 15), "The ideas here require careful and sophisticated thinking, deep professional learning, and often fundamental conceptual shifts." That's fine. But if there is no metric I can use to find out whether I'm following these best practices or not, what good does this book do me?

(Also, without knowing what exactly Bain means by these vague statements, there is no way to ensure that his study wasn't circular, as described in the previous section. I gave only one example, but the general problem is clear: Bain defined great teachers by one set of criteria and then analyzed their behavior in order to extract a second set of criteria. If both sets of criteria are loosely and vaguely defined, there's no way even in principle to know whether he isn't just measuring the same thing both times.)

Credible Reviews

So if we don't trust Bain's study, is there anything else in the book worth reading? Maybe. What the Best College Teachers Do is not myopically focused on Bain's own research. He reviews the literature, citing the conclusions from other studies of teaching quality, broadening the scope of the framework outlined in the book. However, this raises its own problem.

In writing a review, the reviewer is supposed to survey the literature, find all the relevant research, determine what the best research is, and then synthesize everything into a coherent whole (or at least, into something as coherent as the current state of the field allows). The reviewer generally does not describe the studies in sufficient detail to allow the reader to evaluate them directly; only a brief overview is provided, with a focus on the conclusions.

If you trust the reviewer, this is fine. That's why reviews from the most respected researchers in the field are typically highly valued, so much so that publishers and editors often solicit reviews from these researchers. Obviously, a review of the latest research on underwater basket weaving by a fifth-grader would not be so highly prized, because (a) we don't believe the fifth-grader did a particularly thorough review, and (b) we don't trust the fifth-grader's ability to sort the wheat from the chaff -- that is, identify which studies are flawed and which are to be believed.

Bain is clearly very smart. He has clearly read a lot. But I do not trust his ability to read scientific literature critically. The only evidence I have of his abilities is in the design of his own study, which is deeply flawed, as described above. If he can't design a study, why should I trust his analysis of other people's studies?

Building a better mousetrap

Criticizing a study is easy, but it's not much of a critique if you can't identify what a better study would look like. Clearly from my discussion above, I would want (a) clear criteria for defining good teaching, (b) clearly-defined measures of teacher behavior, and (c) a group of good teachers and a group of bad teachers for comparison, and probably a group of average teachers as well (otherwise, any differences between good and bad teachers could be driven by bad habits of the bad teachers rather than good habits of the good teachers).

After a set of behaviors that are typical of good teachers -- and which are less frequent or absent in average or bad teachers -- has been identified, one would then identify a new group of good, average, and bad teachers and replicate the results. (The risk is otherwise is one of over-fitting the data: the differences you found between good teachers and the rest were just the result of random chance. This actually happens quite a lot more than many people realize.)

At the end of this process, we should have a set of behaviors that really are particular to the best teachers, assuming that the criteria we used to define good teachers are valid (not an assumption to be taken lightly).

Becoming a good teacher

Whether or not this information would be of any use to those aspiring to be good teachers is unclear. To find out that, we'd actually need to do a controlled study, assigning one set of teachers to emulate this behavior and another set to emulate behavior typical of average teachers. Ideally, we'd find that the first group ended up teaching better. I'm unsure whether that's particularly likely to happen, for a number of reasons.

First, consider Bain's summary of the habits of the best teachers (summarizing, with some direct quotations, from pps. 15-20):

1. Outstanding teachers know their subjects extremely well.
2. Exceptional teachers treat their lectures, discussion sections, problem-based sessions, and other elements of teaching as serious intellectual endeavors as intellectually demanding and important as their research and scholarship.
3. They avoid objectives that are arbitrarily tied to the course and favor those that embody the kind of thinking and acting expected for life.
4. The best teachers try to create an environment in which people learn by confronting intriguing, beautiful, or important problems, authentic tasks that will challenge them to grapple with ideas, rethink their assumptions, and examine their mental models of reality.
5. Highly effective teachers tend to reflect a strong trust in students.
6. They have some systematic program to assess their own efforts and to make appropriate changes.

Much of this list looks like a combination of intelligence and discipline. That is clearly true for #1, and probably true for #2 and #3. To the extent that #4 is hard to do, it probably takes intelligence. And #6 is just a good idea, more likely to occur to smart people and only pulled off by disciplined people. I'm not sure what #5 really means.

If the key to being a good teacher is to be smart and disciplined, this news will be of little help to teachers who are neither (though it may be helpful to people who are trying to select good teachers). In other words, even if we determine what makes a good teacher, than doesn't mean we can make good teachers.

The best teachers

Of course, even if the strategies that good teachers use are ones you can use yourself, that doesn't mean you can use them correctly.

There is an old parable about two young women. One was exceptionally beautiful. She used to sit at her window and gaze out over the field, looking forlorn and sighing with melancholy. Villagers passing by would stop and stare, struck by her heavenly beauty. One such villager was another young woman, who was the opposite of beautiful. Nonetheless, on seeing this example, she went home, sat at her own window, gazed out over the field and sighed. Someone walked by, saw her, and promptly vomited.

Objectification of female beauty and strange fetishization of melancholy aside, the point of this parable is that just because something works for someone else doesn't mean it'll work for you. When I think about the very best teachers I've known, one thing that stands out is how idiosyncratic their methods and abilities have been. One is a high-energy lecturer who runs and jumps during his lectures (yes, math lectures), who is somehow able to turn linear algebra into a discussion class. Another, in contrast, faded into the background. He rarely lectured, preferring to have students work (in groups or individually) on carefully-crafted questions. A third is a gifted lecturer and the master of the anecdote. While others use funny anecdotes merely to keep a lecture lively, when he uses an anecdote, it is because it illustrates the point at hand better than anything else. Over at the law school, there are a number of revered professors famous for their tendency to humiliate students. This humiliation serves a purpose: to show the students how much they have to learn. The students, rather than being alienated, strive to win their professors' approval.

These methods work for each, but I can't imagine them swapping styles round-robin. Their teaching styles are outgrowths of their personalities. Many are high-risk strategies, which if they fail, fail disastrously (don't humiliate your students unless you have the right kind of charisma first).

Are there strategies that will work for everyone? Is there a way of determining which strategies will work for you, with your unique set of strengths and weaknesses? I'd love to find out. But it won't be by reading What the Best College Teachers Do.

When is the logically impossible possible?

Posted by GamesWithWords on Friday, September 03, 2010

Child's Play has posted the latest in a series of provoking posts on language learning. There's much to recommend the post, and it's one of the better defenses of statistical approaches to language learning around on the Net. It would benefit from some corrections, though, and into the gap I humbly step...

The post sets up a classic dichotomy:

Does language “emerge” full-blown in children, guided by a hierarchy of inbuilt grammatical rules for sentence formation and comprehension? Or is language better described as a learned system of conventions — one that is grounded in statistical regularities that give the appearance of a rule-like architecture, but which belie a far more nuanced and intricate structure?

It's probably obvious from the wording which one they favor. It's also less obviously a false dichotomy. There probably was a very strong version of Nativism that at one point looked like their description of Option #1, but very little Nativist theory I've read from the last few decades looks anything like that. Syntactic Bootstrapping and Syntactic Bootstrapping are both much more nuanced (and interesting) theories.

Some Cheek!

Here's where the post gets cheeky:

For over half a century now, many scientists have believed that the second of these possibilities is a non starter. Why? No one’s quite sure — but it might be because Chomsky told them it was impossible.

Wow? You mean nobody really thought it through? That seems to be what Child's Play thinks, but it's a misrepresentation of history. There are a lot of very good reasons to favor Nativist positions (that is, ones with a great deal of built-in structure). As Child's Play discuss -- to their credit -- any language admits an infinite number of grammatical sentences, so any finite grammar will fail (they treat this as a straw-man argument, but I think historically that was once a serious theory). There are a number of other deep learning problems that face Empiricist theories (Pinker has an excellent paper on the subject from around 1980). There are deep regularities across languages -- such as linking rules -- that are crazy coincidences or reflect innate structure.

The big one, from my standpoint, is that any reasonable theory of language is going to have to have, in the adult state, a great deal of structure. That is, one wants to know why "John threw the ball AT Sally" means something different from "John threw the ball TO Sally." Or why "John gave Mary the book" and "John gave the book to Mary" mean subtly different things (if you don't see that, try substituting "the border" with "Mary"). A great deal of meaning is tied up in structure, and representing structure as statistical co-occurrences doesn't obviously do the job.

Unlike Child's Play, I'm not going to discount any possibility of the opposing theories to get the job done (though I'm pretty sure they can't). I'm simply pointing out that Nativism didn't emerge from a sustained period of collective mental alienation.

Logically Inconsistent

Here we get to the real impetus for this response, which is this extremely odd section towards the end:

We only get to this absurdist conclusion because Miller & Chomsky’s argument mistakes philosophical logic for science (which is, of course, exactly what intelligent design does). So what’s the difference between philosophical logic and science? Here’s the answer, in Einstein’s words, “No amount of experimentation can ever prove me right; a single experiment can prove me wrong.”

In context, this means something like "Just because our theories have been shown to be logically impossible doesn't mean they are impossible." I've seen similar arguments before, and all I can say each time is:

Huh?

That is, they clearly understand logic quite differently from me. If something is logically impossible, it is impossible. 2 + 2 = 100 is logically impossible, and no amount of experimenting is going to prove otherwise. The only way a logical proof can be wrong is if (a) your assumptions were wrong, or (b) your reasoning was faulty. For instance, the above math problem is actually correct if the answer is written in base 2.

In general, one usually runs across this type of argument when there is a logical argument against a researcher's pet theory, and said researcher can't find a flaw with the argument. They simply say, "I'm taking a logic holiday." I'd understand saying, "I'm not sure what the flaw in this argument is, though I'm pretty sure there is one." It wouldn't be convincing (or worth publishing), but I can see that. Simply saying, "I've decided not to believe in logic because I don't like what it's telling me" is quite another thing.

Is psychology a science, redux

Posted by GamesWithWords on Tuesday, August 31, 2010

Is psychology a science? I see this question asked a lot on message boards, and it's time to discuss it again here. I think the typical response by a researcher like myself is an annoyed "of course, you ignoramus." But a more subtle response is deserved, as the answer depends entirely on what you mean by "psychology" and what you mean by "science."

Two Psychologies

First, if by "psychology" you mean seeing clients (like in Good Will Hunting or Silence of the Lambs), then, no, it's probably not a science. But that's a bit like asking whether engineers or doctors are scientists. Scientists create knowledge. Client-visiting psychologists, doctors and engineers use knowledge. Of course, you could legitimately ask whether client-visiting psychologists base their interventions on good science. Many don't, but that's also true about some doctors and, I'd be willing to bet, engineers.

Helpfully, "engineering" and "physics" are given different names, while the research and application ends of psychology confusingly share the same name. (Yes, I'm aware that engineering is not hte application of physics writ broadly -- what's the application of string theory? -- and one can be a chemical engineer, etc. I actually think that makes the analogy to the two psychologies even more apt). It doesn't help that the only psychologists who show up in movies are the Good Will Hunting kind (though if paleoglaciologists get to save the world, I don't see why experimental psychologists don't!), but it does exist.

A friend of mine (a physicist) once claimed psychologists don't do experiments (he said this un-ironically over IM while I was killing time in a psychology research lab). My response now would be to invite him to participate in one of these experiments. Based on this Facebook group, I know I'm not the only one who has heard this.

Methods

There are also those, however, who are aware that psychologists do experiments, but deny that it's a true science. Some of this has to do with the belief that psychologists still use introspection (there are probably some somewhere, but I suspect there are also physicists who use voodoo dolls somewhere as well, along with mathematicians who play the lottery).

The more serious objection has to do with the statistics used in psychology. In the physical sciences, typically a reaction takes place or does not, or a neutrino is detected is not. There is some uncertainty given the precision of the tools being used, but on the whole the results are fairly straight-forward and the precision is pretty good (unless you study turbulence or something similar).

In psychology, however, the phenomena we study are noisy and the tools lack much precision. When studying a neutrino, you don't have to worry about whether it's hungry or sleepy or distracted. You don't have to worry about whether the neutrino you are studying is smarter than average, or maybe too tall for your testing booth, or maybe it's only participating in your experiment to get extra credit in class and isn't the least bit motivated. It does what it does according to fairly simple rules. Humans, on the other hand, are terrible test subjects. Psychology experiments require averaging over many, many observations in order to detect patterns within all that noise.

Science is about predictions. In theory, we'd like to predict what an individual person will do in a particular instance. In practice, we're largely in the business of predicting what the average person will do in an average instance. Obviously we'd like to make more specific predictions (and there are those who can and do), but they're still testable (and tested) predictions. The alternative is to declare much of human and animal behavior outside the realm of science.

Significant differences

There are some who are on board so far but get off the bus when it comes to how statistics are done in psychology. Usually an experiment consists of determining statistically whether a particular result was likely to have occurred by chance alone. Richard Feynman famously thought this was nuts (the thought experiment is that it's unlikely to see a license plate printed CPT 349, but you wouldn't want to conclude much from it).

That's missing the point. The notion of significant difference is really a measure of replicability. We're usually comparing a measurement across two populations. We may find population A is better than population B on some test. That could be because population A is underlyingly better at such tests. Alternatively, population A was lucky that day. A significant difference is essentially a prediction that if we test population A and population B again, we'll get the same results (better performance for population A). Ultimately, though, the statistical test is just a prediction (one that typically works pretty well) that the results will replicate. Ideally, all experiments would be replicated multiple times, but that's expensive and time-consuming, and -- to the extent that the statistical analysis was done correctly (a big if) -- largely unnecessary

So what do you think? Are the social sciences sciences? Comments are welcome.

Honestly, Research Blogging, Get over yourself

Posted by GamesWithWords on Monday, August 02, 2010

A few years ago, science blog posts started decorating themselves with a simple green logo. This logo was meant to credential the blog post as being one about peer-reviewed research, and is supplied by Research Blogging. As ResearchBlogging.org explains:

ResearchBlogging.org is a system for identifying the best, most thoughtful blog posts about peer-reviewed research. Since many blogs combine serious posts with more personal or frivolous posts, our site offers away to find only the most carefully-crafted about cutting-edge research, often written by experts in their respective fields.

That's a good goal and one I support. If you read further down, you see that this primarily amounts to the following: if the post is about a peer-reviewed paper, it's admitted to the network. If it's not, it isn't. I guess the assumption is that the latter is not carefully-crafted or about cutting-edge research. And that's where I get off the bus.

Peer Review is Not Magic

One result of the culture wars is that scientists have needed a way of distinguishing real data from fantasy. If you look around the Internet, no doubt half or even more than half of what is written suggests there's no global warming, that vaccines cause autism, etc. Luckily, fanatics rarely publish in peer-reviewed journals, so once we restrict the debate to what is in peer-reviewed journals, pretty much all the evidence suggests global warming, no autism-vaccine link, etc. So pointing to peer-review is a useful rhetorical strategy.

That, at least, is what I assume has motivated all the stink about peer-review in recent years, and ResearchBlogging.org's methods. But it's out of place in the realm of science blogs. It's useful to think about what peer review is.

A reviewer for a paper reads the paper. The reviewer does not (usually) attempt to replicate the experiment. The reviewer does not have access to the data and can't check that the analyses were done correctly. At best, the reviewer evaluates the conclusions the authors draw, and maybe even criticizes the experimental protocol or the statistical analyses used (assuming the reviewers understand statistics, which in my field is certainly not always the case). But the reviewer doesn't can't check that the data weren't made up, that the experimental protocol was actually followed, that there were no errors in data analysis, etc.

In other words, the reviewer can do only and exactly what a good science blogger does. So good science blogging is, at its essence, a kind of peer review.

Drawbacks

Now, you might worry about the fact that the blogger could be anyone. There's something to that. Of course, ResearchBlogging.org has the same problem. Just because someone is blogging about peer-reviewed paper doesn't mean they understand it (or that they aren't lying about it, which happens surprisingly often with the fluoride fanatics).

So while peer review might be a useful way of vetting the paper, it won't help us vet the blog. We still have to do that ourselves (and science bloggers seem to do a good job of vetting).

A weakness

Ultimately, I think it's risky to put all our cards on peer review. It's a good system, but its possible to circumvent. We know that some set of scientists read the paper and thought it was worth publishing (with the caveats mentioned above). Of course, those scientists could be anybody, too -- it's up to the editor. So there's nothing really stopping autism-vaccine fanatics from establishing their own peer-reviewed journal, with reviewers who are all themselves autism-vaccine fanatics.

To an extent, that already happens. As long as there's a critical mass of scientists who think a particular way, they can establish their own journal, submit largely to that journal and review each other's submissions. Thus, papers that couldn't have gotten published at a more mainstream journal can get a home. I think anyone who has done a literature search recently knows there are a lot of bad papers out there (in my field, anyway, though I imagine the same is true in others).

Peer review is a helpful vetting process, and it does make papers better. But it doesn't determine fact. That is something we still have to find for ourselves.

****
Observant readers will have noticed that I use ResearchBlogging.org myself for it's citation system. What can I say? It's useful.

Confusing verbs

Posted by GamesWithWords on Thursday, July 08, 2010

The first post on universal grammar generated several good questions. Here's an extended response to one of them:

You said that a 1990's theory was dead wrong because sometimes emotion verbs CAN be prefixed with -un. Then you give examples of adjectives, not verbs, that have been prefixed: unfeared, unliked, unloved. I know these words are also sometimes used as verbs, but in the prefixed versions they are clearly adjectives.

The theory I'm discussing wanted to distinguish between emotion verbs which have experiencers as subjects (fear, like) and those who have experiencers as objects (frighten, confuse). The claim was that the latter set of verbs are "weird" in an important way, one effect of which is that they can't have past participles.

This brings up the obvious problem that "frighten" and "confuse" appear to have past participles: "frightened" and "confused". The author then argued that these are not actually past participles -- they're adjectives. The crucial test is that you can add "un" to an adjective but not a participle (or so it's claimed). Thus, it was relevant that you can say "unfrightened" and "unconfused", suggesting that these are adjectives, but you can't say "unfeared" or "unliked", suggesting that these are participles, not adjectives.

The problem mentioned in the previous post was that there are also subject-experiencer verbs that have participles which can take the "un" prefix, such as "unloved". There are also object-experiencer verbs that have participles which can't be "un" prefixed, like "un-angered" (at least, it sounds bad to me; try also "ungrudged", "unapplauded", or "unmourned"). So the "un" prefixation test doesn't reliably distinguish between the classes of verbs. This becomes apparent once you look through a large number of both types of verbs (here are complete lists of subject-experiencer and object-experiencer verbs in English).

There is a bigger problem, which is that the theory assumes a lack of homophones. That is, there could be two words pronounced like "frightened" -- one is a past participle and one is an adjective. The one that can be unprefixed is the adjective. So the fact that "unfrightened" exists as a word doesn't rule out the possibility that "frighten" has a past participle.

To be completely fair to the theory, the claim that object-experiencer verbs are "weird" (more specifically, that they require syntactic movement) could be still be right (though I don't think it is). The point here was that the specific test proposed ("un" prefixation) turned out to provide different results. It actually took some time for people to realize this, and you still see the theory cited. The point is that getting the right analysis of a language is very difficult, and typically many mistakes are made along the way.

Universal meaning

Posted by GamesWithWords on Wednesday, July 07, 2010

My earlier discussion of Evans and Levinson's critique of universal grammar was vague on details. Today I wanted to look at one specific argument.

Funny words

Evans and Levinson briefly touch on universal semantics (variously called "the language of thought" or "mentalese"). The basic idea is that language is a way of encoding our underlying thoughts. The basic structure of those thoughts is the same from person to person, regardless of what language they speak. Quoting Pinker, "knowing a language, then, is knowing how to translate mentalese into strings of words and vice versa. People without a language would still have mentalese, and babies and many nonhuman animals presumably have simpler dialects."

Evans and Levinson argue that this must be wrong, since other languages have words for things that English has no word for, and similarly English has words that don't appear in other languages. This is evidence against a simplistic theory on which all languages have the same underlying vocabulary and differ only on pronunciation, but that's not the true language of thought hypothesis. Many of the authors cited by Evans and Levinson -- particularly Pinker and Gleitman -- have been very clear about the fact that languages pick and choose in terms what they happen to encode into individual words.

The Big Problems of Semantics

This oversight was doubly disappointing because the authors didn't discuss the big issues in language meaning. One classic problem, which I've discussed before on this blog, is the gavagai problem. Suppose you are visiting another country where you don't speak a word. Your host takes you on a hike, and as you are walking, a rabbit bounds across the field in front of your. Your host shouts "gavagai!" What should you think gavagai means?

There are literally an infinite number of possibilities, most of which you probably won't consider. Gavagai could mean "white thing moving," or "potential dinner," or "rabbit" on Tuesdays but "North Star" any other day of the week. Most likely, you would guess it means "rabbit" or "running rabbit" or maybe "Look!" This is a problem to solve, though -- given the infinite number of possible meanings, how do people narrow down on the right one?

Just saying, "I'll ask my host to define the word" won't work, since you don't know any words yet. This is the problem children have, since before explicit definition of words can help them learn anything, they must already have learned a good number of words.

One solution to this problem is to assume that humans are built to expect words of certain sorts and not others. We don't have learn that gavagai doesn't change it's meaning based on the day of the week because we assume that it doesn't.

More problems

That's one problem in semantics that is potentially solved by universal grammar, but not the only. Another famous one is the linking problem. Suppose you hear the sentence "the horse pilked the bear". You don't know what pilk means, but you probably think the sentence describes the horse doing something to the bear. If instead you find out it describes a situation in which the bear knocked the horse flat on its back, you'd probably be surprised.

That's for a good reason. In English, transitive verbs describe the subject doing something to the object. That's not just true of English, it's true of almost every language. However, there are some languages where this might not be true. Part of the confusion is that defining "subject" and "object" is not always straightforward from language to language. Also, languages allow things like passivization -- for instance, you can say John broke the window or The window was broken by John. When you run into a possible exception to the subject-is-the-doer rule, you want to make sure you aren't just looking at a passive verb.

Once again, this is an example where we have very good evidence of a generalization across all languages, but there are a few possible exceptions. Whether those exceptions are true exceptions or just misunderstood phenomena is an important open question.

-------
Evans, N. and Levinson, S. (2009). The myth of language universals: Language diversity and its importance for cognitive science Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X0999094X

photo credit

Video games, rotted brains, and book reviews

Posted by GamesWithWords on Tuesday, June 08, 2010

Jonah Lehrer has an extended discussion of his review of The Shallows, a new book claiming that the Internet is bad for our brains. Lehrer is skeptical, pointing out that worries about new technology are as old as time (Socrates thought books would make people stupid, too). I am skeptical as well, but I'm also skeptical of (parts of) Lehrer's arguments. The crux of the argument is as follows:

I think it's far too soon to be drawing firm conclusions about the negative effects of the web. Furthermore, as I note in the review, the majority of experiments that have looked directly at the effects of the internet, video games and online social networking have actually found significant cognitive benefits.

That, so far as it goes, is reasonable. My objection is to some of the evidence given:

A 2009 study by neuroscientists at the University of California, Los Angeles, found that performing Google searches led to increased activity in the dorsolateral prefrontal cortex, at least when compared with reading a "book-like text." Interestingly, this brain area underlies the precise talents, like selective attention and deliberate analysis, that Carr says have vanished in the age of the Internet. Google, in other words, isn't making us stupid -- it's exercising the very mental muscles that make us smarter.

This cuts several ways. Extra activation of a region in an fMRI experiment is interpreted different ways by different researchers. It could be evidence of extra specialization ... or evidence that the brain network in question is damaged and so needs to work extra hard. Lehrer is at least partially aware of this problem:

Now these studies are all imperfect and provisional. (For one thing, it's not easy to play with Google while lying still in a brain scanner.)

This is the line I have a particular issue with. If the question is whether extra Internet use makes people stupid, why on Earth would anyone need to use a $600/hr MRI machine to answer that question? We have loads of cheap psychometric tests of cognition. All methodologies have their place, and a behavior question is most easily answered with behavioral methods. MRI is far more limited.

Lehrer's discussion of the 2009 study above underscores this point: the interpretation of the brain images rests on our understanding of what behaviors the dorsolateral prefrontal cortex has shown up with in other studies. The logic is: A correlates with B correlates with C, thus A correlates with C. This is, as any logician will tell you, an unsound conclusion. When you add that using MRI can cost ten thousand dollars for a single experiment, it's a very expensive middleman!

Which isn't to say that MRI is useless or such studies are a waste of time. MRI is particularly helpful in understanding how the brain gives rise to various types of behavior, and it's sometimes helpful for analyzing behavior that we can't directly see. Neither applies here. If the Internet makes us dumb in a way only detectable with super-modern equipment, I think we can breath easy and ignore the problem. What we care about is whether people in fact are more easily distracted, have worse memory, etc. That doesn't require any special technology -- even Socrates could run that experiment.

****
Lehrer does discuss a number of good behavioral experiments. Despite my peevishness over the "Google in the scanner" line, the review is more than worth reading.

Field of Science