Field of Science

VerbCorner: A Citizen Science project to find out what verbs mean

Earlier this week, I blogged about our new VerbCorner project. At the end, I promised that there would be more info forthcoming about why we are doing this project, about its aims and expected outcomes, why it's necessary, etc. Here's the first installment in that series.

Computers and language

I just dictated the following note to Siri
Many of our best computer systems treat words as essentially meaningless symbols that need to be moved around.
Here's what she wrote
Many of our best computer system street words is essentially meaningless symbols that need to be moved around.
I rest my case.

The problem of meaning.

I don't know for sure how Siri works, but her mistake is emblematic of how much language software works. Computer systems treat and Computer system street sound approximately the same, but that's not something most humans would notice because the first interpretation makes sense and the second one doesn't. 

Decades of research shows that human language comprehension is heavily guided by plausibility: when there are two possible interpretations of what you just heard, go for the one that makes sense. This happens speech recognition like in the example above, and it plays a key role in understanding ambiguous words. If you want to throw Google Translate for a look, give it the following:
John was already in his swimsuit as we reached the watering hole. "I hope the tire swing is still there," John said as he headed to the bank.
Although the most plausible interpretation of bank here is side of a river, Google Translate will translate it into the word for "financial institution" in whatever language you are translating into, because that's the most common meaning of the English work bank.

So what's the problem?

I assume that this limitation is not lost on the people at Google or at Apple. And, in fact, there are computer systems that try to incorporate meaning. The problem there is not so much the computer science as the linguistic science.** Dictionaries notwithstanding, scientists really do not know very much about what words mean, and it is hard to program the computer to know what the word means when you actually do not know.

(Dictionaries are useful, but as an exercise, pick* definition from a dictionary and come up with a counterexample. It is not hard.)

One of the limitations is scope. Language is huge. There are a lot of words. So scientists will work on the meanings of a small number of words. This is helpful, but a computer that only knows a few words is pretty limited. We want to know the meanings of all words.

Solving the problem

We've launched a new section of the website, VerbCorner. There, you can answer questions about what verbs mean. Rather than try to work out the meaning of a word all at once, we have broken up the problem in a series of different questions, each of which tries to pinpoint a specific component of meaning. Of course, there are many nuances to meaning, but research has shown that certain aspects are more important that others, and we will be focusing on those.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

----
*Dragon Dictate originally transcribed this as "pickled", which I did not catch on proofreading. More evidence that we need computer programs that understand what words mean.
**Dragon Dictate make spaghetti out of this sentence, too.

Citizen Science at GamesWithWords.org: The VerbCorner Project

What do verbs mean? We'd like to know. For that reason, we just launched VerbCorner, a massive, crowd-sourced investigation into the meanings of verbs. 

Why do we need this project? Why not just look up what verbs mean in a dictionary? While dictionaries are enormously useful (I think I own something like 15), they are far from perfect. For one thing, it's usually very easy to find counter-examples even for what seem like straight-forward definitions. Take the following:
Bachelor: An unmarried man.
So is the Pope a bachelor? Is Neil Patrick Harris? How about a married man from a country in which men are allowed multiple wives?

At VerbCorner, rather than trying to work out the whole definition at once, we have broken meaning into many different components. At the site, you will find several different tasks. In each task, you will try to determine whether a particular verb has a particular component of meaning. 

If you are interested in what words mean and would like to help with this project, sign up for an account at http://gameswithwords.org/VerbCorner/. Participation can be anonymous, but we are happy to recognize significant contributions from anyone who wishes it.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

A Critical Period for Learning Language?

If you bring adults and children into the lab and try teaching them a new language, adults will learn much more of the language much more rapidly than the children. This is odd, because probably one of the most famous facts about learning languages -- something known by just about everyone whether you are a scientist who studies language or not -- is that adults have a lot less success at learning language than children. So whatever it is that children do better, it's something that operates on a timescale too slow to see in the lab. 

This makes studying the differences between adult and child language learners tricky, and a lot less is known that we'd like. Even the shape of the change in language learning ability is not well-known: is the drop-off in language learning ability gradual, or is there a sudden plummet at a particular age? Many researchers favor the latter possibility, but it has been hard to demonstrate simply because of the problem of collecting data. The perhaps most comprehensive study comes from Kenji Hakuta, Ellen Bialystok and Edward Wiley, who used U.S.A. Census data from 2,016,317 Spanish-speaking immigrants and 324,444 Chinese-speaking* immigrants, to study English proficiency as a function of when the person began learning the language. 

Their graph shows a very gradual decline in English proficiency as a function of when the person moved to the U.S.



Unfortunately, the measure of English proficiency wasn't very sophisticated. The Census simply asks people to say how well they speak English: "not at all", "not well", "well", "very well", and "speak only English". This is better than nothing, and the authors show that it correlates with a more sophisticated test of English proficiency, but it's possible that the reason the lines in the graphs look so smooth is that this five-point scale is simply too coarse to show anything more. The measure also collapses over vocabulary, grammar, accent, etc., and we know that these behave differently (your ability to learn a native-like accent goes first).

A New Test

This was something we had in mind when devising The Vocab Quiz. If we get enough non-native Speakers of English, we could track English proficiency as a function of age ... at least as measured by vocabulary (we also have a grammar test in the works, but that's more difficult to put together and so may take us a while yet). I don't think we'll get two million participants, but even just a few thousand would be enough. If English is your second (or third or fourth, etc.) language, please participate. In addition to helping us with our research and helping advance the science of language in general, you will also be able to see how your vocabulary compares with the typical native English speaker who participates in the experiment.

--------
Hakuta, K., Bialystok, E., & Wiley, E. (2003). Critical Evidence: A Test of the Critical-Period Hypothesis for Second-Language Acquisition Psychological Science, 14 (1), 31-38 DOI: 10.1111/1467-9280.01415



*Yes, I know: Chinese is a family of languages, not a single language. But the paper does not report a by-language breakdown for this group.

Living in an Imperfect World: Psycholinguistics Edition

You, sir, have tasted two whole worms. You have hissed all my mystery lectures and been caught fighting a liar in the quad. You will leave Oxford by the next town drain. -- Reverend Spooner.

There is an old tension in psycholinguistic (or linguistic) theory, which boils down to two ways of looking at language comprehension. When somebody says something to you, what do you do with that linguistic input? Is your goal to decode the sentence and figure out what the sentence means, or do you try to figure out what message the speaker intended to convey? The tension comes in because presumably we do a bit of both.

Suppose a young child says, "Look! A doggy!" while pointing to a cat. Most people will agree that technically, the child's sentence is about a dog. But most of can still work out that probably the child meant to talk about the cat; she used the word doggy either due to lack of vocabulary, confusion about the distinction between dogs and cats, or a simple speech error. Similarly, if your friend says at 7pm, "Let's go have lunch," technically your friend is suggesting having the midday meal, but probably you charitably assume he is just very hungry and so made a mistake in saying "lunch" instead of "dinner".

For a variety of reasons, linguistics and psycholinguistics have focused mostly on decoding sentences rather than intended meanings. This is important work about an important problem, but -- as we saw above -- it's only half the story. PNAS just published a paper by Gibson, Bergen, and Piantadosi that addresses the second half. Gibson and Bergen are at M.I.T., and Piantadosi recently graduated from M.I.T., and like much of the work coming out of Eastern Cambridge lately, they take a Bayesian perspective on the problem, and point out that the probability that the speaker intended to convey a particular message m given that they said sentence s is proportional to the prior probability that the speaker might want to convey m times the probability that they would say sentence s when intending to convey m.

This ends up accounting for the phenomenon brought up in Paragraph #2: If the literal meaning of the speaker's sentence isn't very likely to be what they intended to say ("Let's go have lunch", spoken at 7pj), but there is some other sentence that contains roughly the same words but has a more plausible meaning ("Let's go have dinner"), then you should infer that the intended message is the latter one and that the speaker made an error.

So far, this is not much more than a restatement of our intuitive theory in Paragraph #2. But a Gibson, Bergen and Piantadosi point out that a few non-trivial predictions come out of this. One is that you should assume that deletions (dropping a word) are more likely than insertions (adding a word). The reason is that there are only so many words that can be dropped from a particular sentence, so even if the probability of accidentally dropping a word is low, the probability of accidentally dropped a particular word isn't all that much lower. So if the intended sentence was "The ball was kicked by the girl", and the speaker accidentally dropped two words, the probability that the speaker happened to drop "was" and "by", resulting in the grammatical but unlikely sentence "The ball kicked the girl" is not so bad. However, suppose the intended sentence was "The girl kicked the ball", what are the chances the speaker accidentally adds "was" and "by", resulting in the grammatical but unlikely sentence "The girl was kicked by the ball"? Pretty much zilch, since English contains hundreds of thousands of words: There is pretty much no chance that those particular words would be inserted in those particular locations?

The authors present some data to back up these and some other predictions. For instance, if listeners are given reason to suspect that the speaker makes lots of speech errors, they are then even more likely to "correct" an unlikely sentence to a similar sentence with a more likely meaning.

There's plenty more work to be done. There are plenty of speech errors out there besides insertions and deletions, such as substitutions and the various phonological errors that made Rev. Spooner famous (see quote above). Work on phonological errors shows that speaker are more likely to make errors that result in real words (train->drain) than non-words (train->frain). Likely, the same is true of other types of errors. Building a full theory that incorporates all the complexity of speech processes is a ways off yet. But the work just published is an important proof of concept.

---------
Gibson, E., Bergen, L., and Piantadosi, S. (2013). Rational integration of noisy evidence and prior semantic expectations in sentence interpretation Proceedings of the National Academy of Sciences DOI: 10.1073/pnas.1216438110

Do You Speak Korean?


Learning new languages is hard for many reasons. One of those reasons is that the meaning of an individual word can have a lot of nuances, and the degree to which those nuances match up with the nuances of similar words in your first language can make learning the new language easier; the degree to which the nuances diverge can make learning the
new language harder.

In a new experiment, we are looking at English-speakers learning Korean and Korean-speakers learning English. In particular, we are studying a specific set of words that previous research has suggested give foreign language learners a great deal of difficulty.

We are hoping that we will be able to track how knowledge of these words develops as you move from being a novice to a fluent speaker. For this, we will need to find a lots of people who are learning Korean, as well as Korean-speakers who are learning English. If you are one, please participate.

The experiment is called "Trials of the Heart". You can find it here.

We do also need monolingual English speakers (people whose first and essentially only language is English) for comparison, so if you that's you, you are welcome to participate, too!

Image credit

Evolutionary Psychology, Proximate Causation, & Ultimate Causation


Evolutionary psychology has always been somewhat controversial in the media for reasons that generally confuse me (Wikipedia has a nice rundown of the usual complaints). For instance, the good folks at Slate are particularly hostile (here, here and here), which is odd because they are also generally hostile towards Creationism (here, here and here). 

Given the overwhelming evidence that nearly every aspect of the human mind and behavior is at least partly heritable (and so at least partially determined by our genes), the only way to deny the claim that our minds are at least partially a product of evolution is to deny that evolution affects our genes – that is, deny the basic tenants of evolutionary theory. (I suppose you could try to deny the evidence of genetic influence on mind and behavior, but that would require turning a blind eye to such a wealth of data as to make Global Warming Denialism seem like a warm-up activity).

What's the matter with Evolutionary Psychology?

What is there to object to, anyway? Some of the problem seems definitional. Super-Science-Blogger Greg Laden acknowledges that applying evolutionary theory to the study of the human mind is a good idea, but that "evolutionary psychology" refers only to a very specific theory from Cosmides and Tooby, one with which he takes issue. And in general, a lot of the "critiques" I see in the media seem to involve equating the entire field with some specific hypothesis or set of hypotheses, particularly the more exotic ones. 

For instance, some years back Slate ran an article about "Evolutionary Psychology's Anti-Semite", a discussion of Kevin MacDonald, who has an idiosyncratic notion of Judaism as a "group evolution strategy" to maximize, through eugenics, intelligence (the article goes into some detail). It's a pretty nutty idea, gets basic historical facts wrong, and more importantly gets the science wrong. The article tries pretty hard to paint him as a mainstream Evolutionary Psychologist nonetheless. Interviewees aren't that helpful (they mostly dismiss the work as contradicting basic fundamentals of evolutionary theory), but the article author pulls up other evidence, like the fact that MacDonald acknowledged some mainstream researchers in one of his books. (For the record, I acknowledge Benicio del Toro as an inspiration, so you know he fully agrees with everything in this blog post. Oh, and Jenna-Louise Coleman, too.)

This spring, New York Times columnist John Tierney asserted that men must be innately more competitive than women since they monopolize the trophies in -- hold onto your vowels -- world Scrabble competitions. To bolster his case, Tierney turned to evolutionary psychology. In the distant past, he argued, a no-holds-barred desire to win would have been an adaptive advantage for many men, allowing them to get more girls, have more kids, and pass on their competitive genes to today's word-memorizing, vowel-hoarding Scrabble champs.
I will agree that this argument involves a bit of a stretch and is awfully hard to falsify (as the article goes on to point out). And sure, some claims made even by serious evolutionary psychologists are hard to falsify with current technology ... but then so is String Theory. And we do have many methods for testing evolutionary theory in general, and roughly the same ones work whether you are studying the mind and behavior or purely physical attributes of organisms. So, again, if you want to deny that claims about evolutionary psychology are testable, then you end up having to make roughly the same claim about evolutionary theory in general. 

Just common sense

It turns out that when you look at the biology, a good waist-hips ratio for a healthy woman is (roughly) .7, whereas the ideal for men is closer to .9. Now imagine we have a species of early hominids (Group A) that is genetically predispositioned such as that heterosexual men prefer women with a waist-hips ratio of .7 and heterosexual women prefer men with a waist-hips ratio of .9. Now let's say we have another species of early hominids (Group B) where the preferences are reversed, preferring men with ratios of .7 and women with ratios of .9. Since individuals of Group A prefer to mate with healthier partners than Group B does, which one do you think is going to have more surviving children? 

Now compare to Group C, where there is no innate component to interest in waist-hips ratios; beauty has to be learned. Group C is still at a disadvantage to Group A, since some of the people in it will learn to prefer the wrong proportions and preferentially mate with less healthy individuals. In short, all else equal, you would expect evolution to lead to hominids that prefer to mate with hominids that have close-to-ideal proportions.

(If you don't like waist-hips ratios, consider that humans prefer individuals without deformities and gaping sores and boils, and then play the same game.)

Here is another example. Suppose that in Group A, individuals find babies cute, which leads them to want to protect and nourish the infants. In Group B, individuals find babies repulsive, and many actually have an irrational fear of babies (that is, treating babies something like how we treat spiders, snakes & slugs). Which one do you think has more children that survive to adulthood? Once again, it's better to have a love of cuteness hardwired in rather than something you have to learn from society, since all it takes is for a society to get a few crazy ideas about what cute looks like ("they look better decapitated!") and then the whole civilization is wiped out. 

(If you think that babies just *are* objectively cute and that there's no psychology involved, consider this: Which do you find cuter, a human baby or a skunk baby? Which do you think a mother skunk finds cuter?)

These are the kinds of issues that mainstream evolutionary psychology trucks in. And the theory does produce new predictions. For instance, you'd expect that in species where a .7 waist-hips ratio is not ideal for females (that is, pretty much any species other than our own), it wouldn't be favored (and it isn't). And the field is generally fairly sensible, which is not to say that all the predictions are right or that evolutionary theory doesn't grow and improve over time (I understand from a recent conversation that there is now some argument about whether an instinct for third-party punishment is required for sustainable altruism, which is something I had thought was a settled matter). 

Findings: The Role of World Knowledge in Pronoun Interpretation

A few months ago, I posted the results of That Kind of Person. This was the final experiment in a paper on pronoun interpretation, a paper which is now in press. You can find a PDF of the accepted version here.

How it Began

Isaac Asimov famously observed that "the most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" That quote describes this project fairly well. The project grew out of a norming study. Norming studies aren't really even real experiments -- they are mini experiments used to choose stimuli.

I was designing an ERP ("brain wave") study of pronoun processing. A group in Europe had published a paper using ERPs to look at a well-known phenomenon in pronoun interpretation, one which has been discussed a lot on this blog, in which pronoun interpretation clearly depends on context:

(1) Sally frightens Mary because she...
(2) Sally likes Mary because she...

Most people think that "she" refers to Sally in (1) but Mary in (2). This seems to be a function of the verbs in (1-2), since that's all that's different between the sentences, and in fact other verbs also affect pronoun interpretation. We wanted to follow up some of the previous ERP work, and we were just choosing sentences. You get nice big ERP effects (that is, big changes in the brain waves) when something is surprising, so people often compare sentences with unexpected words to those with expected words, which is what this previous group had done:

(3) Sally frightens Bill because she...
(4) Bill frightens Sally because she...

You should get the sense that the pronoun "she" is a bit more surprising in (4) than in (3). Comparing these sentences to (1-2) should make it clear why this is.

The Twist

A number of authors argued that what is going on is that these sentences (1-4) introduce an explanation ("because..."). As you are reading or listening to the sentence, you think through typical causes of the event in question (frightening, liking, etc.) and so come up with a guess as to who is going to be mentioned in the explanation. More good explanations of an instance of frightening involve the frightener than the frightenee, and more good explanations of an instance of liking involve the like-ee than the liker.

The authors supported the argument by pointing to studies showing that what you know about the participants in the event matters. In general, you might think that in any given event involving a king and a butler, kings are more likely to be responsible for the event simply because kings have more power. So in the following sentence, you might interpret the pronoun as referring to the king even though it goes against the "typical" pattern for frighten (preferring explanations involve the frightener).

(5) The butler frightened the king because...

What got people particularly excited about this is that it all has to happen very fast. Studies have shown that you can interpret the pronoun in such sentences in a fraction of a second. If you can do this based on a complex inference about who is likely to do what, that's very impressive and puts strong constraints on our theory of language.

The Problem

I was in the process of designing an ERP experiment to follow up a previous one in Dutch that I wanted to replicate in English. I had created a number of sentences, and we were running a simple experiment in which people rate how "natural" the sentences sound. We were doing this just to make sure none of our sentences were weird, since that -- as already mentioned -- can have been effects on the brain waves, which could swamp any effects of the pronoun. Again, we expected people to rate (4) as less natural than (3); what we wanted to make sure was that people didn't rate both (3) and (4) as pretty odd. We tested a couple hundred such sentences, from which we would pick the best for the study.

I was worried, though, because a number of previous studies had suggested that gender itself might matter. This follows from the claim that who the event participants are matters (e.g., kings vs. butlers). Specifically, a few studies had reported that in a story about a man and a woman, people expect the man to be talked about more than the woman, analogous to expecting references to the king rather than the butler in (5). Was this a confound?

I ran the study anyway, because we would be able to see in the data just how bad the problem was. To my surprise, there was no effect of gender at all. I started looking at the literature more carefully and noticed that several people had similarly failed to find such effects. One paper had found an effect, but it seemed to be present in only a small handful of sentences out of the large number they had tested. I looked into studies that had investigated sentences like (5) and discovered ... that they didn't exist! Rather, the studies researchers had been citing weren't about pronoun interpretation at all but something else. To be fair, some researchers had suggested that there might be a relationship between this other phenomenon and pronoun interpretation, but it had never been shown. I followed up with some experiments seeing whether the king/butler manipulation would affect pronoun interpretation, and it didn't. (For good measure, I also showed that there is little if any relationship between that other phenomenon and pronouns.)

A Different Problem

So it looked like the data upon which much recent work on pronouns is built was either un-replicable or apocryphal. However, the associated theory had become so entrenched, that this was a difficult dataset to publish. I ultimately had to run around a dozen separate experiments in order to convince reviewers that these effects really don't exist (or mostly don't exist -- there do seem to be a tiny percentage of sentences, around 5%, where you can get reliable if very small effects of gender). (A typical paper has 1-4 experiments, so a dozen is a lot. Just in order to keep the paper from growing to an unmanageable length, I combined various experiments together and reported each one as a separate condition of a larger experiment.)

Most of these experiments were run on Amazon Mechanical Turk, but the final one was run at GamesWithWords.org and was announced on this blog (read the results of that specific experiment here). The paper is now in press at Language & Cognitive Processes. You can read the final submitted version here.

Conclusion

So what does all this mean? In many ways, it's a correction to the literature. A lot of theoretical work was built around findings that turned out to be wrong or nonexistent. In particular, the idea that pronoun interpretation involves a lot of very rapid inferences based on your general knowledge about the world. That's not quite the same thing as having a new theory, but we've been exploring some possibilities that no doubt will be talked about more here in the future.
----

Joshua K. Hartshorne (2014). What is implicit causality? Language and Cognitive Processes

Everlasting Love

I just got back data from a survey in which we asked people to estimate how long different emotions are likely to last. We'll use this information to design a future experiment looking at how people expect emotions to be encoded in language. In the meantime, what struck me is that of all the emotions we asked about, the one that people expected to last the longest was "being head-over-heels in love". Which is awesome.






(Image courtesy of Faizal Sharif)

New Experiment: The Vocab Quiz

Curious how good your vocabulary is? I just posted a new experiment that will tell you. There are 32 questions. At the end, you'll see your score and how it compares with others who have done the experiment. This should be a fairly hard test. I piloted it on around 40 people, and only a few managed to get all the questions right. Then I made it harder. You can find the experiment here.

What is the purpose of the experiment?

We are interested in why some people have better vocabularies than others. So before you take the test, you'll answer some questions about your background, such as your age, level of education, and birth order. The predictions for age and level of education are probably fairly obvious. The predictions for birth order are less clear. Some researchers would predict that eldest children will have better vocabularies (they spent more time with their parents and so got a jump start). Others would predict that the youngest would have better vocabularies (they had extra teachers in the home!). Still other researchers would argue that birth order (being the oldest or youngest, etc.) should have no effect on vocabulary, because they argue that pretty much nothing is affected by birth order.

We are particularly interested in people for whom English is a second language. What factors lead some people to easily acquire a second language and others not?

Take the Vocab Quiz.


New Experiment: The Language & Memory Test

There is a close relationship between language and memory, since of course whenever you use words and grammar, you have to access your memory for those words and that grammar. If you couldn't remember anything, you couldn't learn language to begin with.

The relationship between language and memory is not well understood, partly because they tend to be studied by different people, though there are a few labs squarely interested in the relationship between language and memory, such as the Brain and Language Lab at Georgetown University.

This week, I posted a new experiment, "The Language & Memory Test", which explores the relationship between memory and language. The experiment consists of two components. One is a memory test. At the end, you will see your score and how it compares with other people who took the test. This test is surprisingly hard for how simple it seems.

In the other part, you will try to learn to use some new words. We'll be studying the relationship between different aspects of your memory performance and how you learn these new words. As always, there will be a bit more explanation at the end of the experiment. When the experiment is done and the results are known, there will be a full description of them and what we learned here at the blog and at GamesWithWords.org.

Try the Language & Memory test here.

New Experiment: Collecting Fancy Art

Over the last few years, we've run a lot of experiments online at GamesWithWords.org, resulting so far in four publications, with a number of others currently under review at various journals. Most of these have experiments have focused on how people process and interpret language. I just posted a new experiment (Collecting Fancy Art) that is more squarely focused on learning language. Language learning experiments are somewhat tricky to do online, since they tend to take longer than the 5-10 minute format of online experiments, but they are important.

One of the most salient truths about language is that language has to be learned. This is clearly pretty hard, or other animals would be able to do it and we'd already have computers that were pretty good at language. But just how the learning process happens is a bit of a mystery, partly because language is a complex, interconnected system. When you learn one word, it affects how you use other words.

In this experiment, you will simultaneously learn the meanings of three different words. We're interested in seeing how your understanding of these words develops. As always, you'll learn more about the experiment at the end. And check back here in the future: After the experiment is completed, the results will be posted here.

The experiment is called "Collecting Fancy Art". You can find it here.

Lab Notebook: Social Networking

The problem with websites is they quickly become obsolete. A few years ago, I updated the website to make it easier to share pages, adding buttons for Facebook, Twitter, Digg, and Reddit. A little while ago, I noticed that the Digg button wasn't working anymore. Then the Twitter button disappeared. 

I just updated the website, switching from native buttons for social networking systems to ShareThis. ShareThis has the advantage of incorporating every social networking system you've heard of and a bunch you haven't heard of (I've put Google+, Facebook, Twitter, Tumblr, and email up front, but by clicking on the ShareThis button, users can choose from dozens of networks). 

Fieldofscience (the network this blog is a part of) has been using ShareThis for a couple years. However, it went through several periods where it wasn't working. Periodically, it would have memory failures, and posts that had once had dozens of likes suddenly went to zero. But lately it seems much more stable, so I'm trying it out.

The disadvantage is that every page says that it hasn't been liked by anybody, which isn't great advertising for the website. (*UPDATE* We've got a few shares now on some of the pages.) I hope this changes quickly.

The $64,000 question is, of course, whether this update changes the overall amount of traffic to the website. It's been averaging around 2,000 visitors/month for a couple years now. That's very respectable for a research website. However, many of the experiments now running (like the Mind Reading Quotient and Finding Explanations) require large numbers of participants, and they would really benefit from an uptick in traffic.

Who you gonna believe: E. O. Wilson or common sense?

I was planning a post on E. O. Wilson's recent flight of fancy, "Great Scientist ≠ Good at Math", in which he tells potential future scientists that knowing math isn't all that important, but it turns out Jeremy Fox has already said everything I was going to say, only better. It's a long post, though, so here are some key passages:
Wilson’s claim that deep interest in a subject, combined with deep immersion in masses of data, is sufficient, because hey, it worked for Charles Darwin, is utter rubbish. First of all, just because it worked for Darwin (or Wilson) doesn’t mean it will work for you, and just because it worked in the 19th century doesn’t mean it will work in the 21st. If for no other reason than that there are plenty of people out there, in every field, who not only have a deep interest in the subject and an encyclopedic knowledge of the data, but who know a lot of mathematics and statistics.
and

Wilson claims that strong math skills are relevant only a few disciplines, like physics. Elsewhere, great science is a matter of “conjuring images and processes by intuition”... I’m sure Wilson is describing his own approach here, and it’s worked for him. But I have to say, it’s surprising to find someone as famous for his breadth of knowledge as E. O. Wilson generalizing so unthinkingly from his own example. I wonder what his late collaborator Robert MacArthur would think of the notion that intuition alone is enough. I wonder what Bill Hamilton would think. Or R. A. Fisher. Or J. B. S. Haldane. Or Robert May. Or John Maynard Smith. Or George Price. Or Peter Chesson. Or Dave Tilman. Or lots of other great ecologists and evolutionary biologists I could name off the top of my head. Would Wilson seriously argue that none of those people were great scientists, or that they never made any great discoveries, or that the great discoveries they made arose from intuition unaided by mathematics?
Meanwhile, over at Finding the Next Einstein, Jonathan Wai draws on his own research to argue that mathematics ability is key to success in a wide range of scientific fields (though these data are unfortunately correlational).

International Journal of Lousy Research

Jeffrey Beall's blacklist of "predatory open-access journals" -- discussed in yesterday's New York Times -- provides evidence for my long-standing suspicion of any journal named "International Journal of ..." There probably are some good journals named "International Journal of...", but I don't know of any off-hand. And there seem to be an awful lot of bad ones, probably for good reason: An internationally-recognized journal doesn't have to say so. So almost by definition a journal that has to call itself "International Journal of" is probably not a well-known journal.

In general, nearly every journal on the list has some location in its name, such as South Asian Journal of Mathematics, which doubles down by referring to itself on its home page as an "international journal". Again, there are, of course, good journals with region-specific names. But there don't seem to be many. I'm less sure of the reason for this one.

[Future Post: Explaining why universities that market themselves as "The Harvard of" some region are frequently not even the most prestigious school in that region.]

Laying to rest an old myth about Chinese

I just got back from my second research trip to Taiwan in three years (with another planned soon!) and fourth trip overall. As always, I had a great time and ate as much beef noodle soup as I could manage.


As always, I spent a couple months beforehand brushing up my reading and writing. This isn't something I have to do before trips to Spain or Russia. A few hours spent learning Spanish or Russian orthography, and you are set for life. As soon as I blink, I forget how to read and write Chinese. This is because, as is well known, rather than a couple dozen phonetic symbols, Chinese employs thousands of easily-confusable characters which, if you don't use for a while, you end up confusing.

This isn't just a problem for foreigners. Students in Taiwan (and China or Japan, I assume) continue investing significant amounts of time into learning to read and write additional characters well through secondary school. This raises the question of why Chinese-speakers don't just adopt a phonetic writing system?

Problems with a Chinese phonetic writing system

The argument one often hears is that Chinese has so many homophones (words that sounds like), that if you wrote them all the same way, there would be so much ambiguity that it would be impossible to read. The character system solves this by having different characters for different words, even ones that sound alike.

In the last century, when switching to a phonetic system was proposed, a scholar illustrated this problem with the following poem, which reads something like this:
Shi shi shi shi shi shi, shi shi, shi shi shi shi. Shi shi shi shi shi shi shi shi shi, shi shi shi shi shi, shi shi, shi shi shi shi shi. Shi shi shi shi shi, shi shi shi, shi shi shi shi shi shi. Shi shi shi shi shi shi, shi shi shi. Shi shi shi, shi shi shi shi shi shi. Shi shi shi, shi shi shi shi shi shi shi shi. Shi shi shi shi shi shi shi shi shi shi shi shi shi. Shi shi shi shi.
As written, this is incomprehensible. Only if you write it in characters
the meaning becomes clear:
A poet named Shi lived in a stone house and liked to eat lion flesh and he vowed to eat ten of them. He used to go to the market in search of lions and one day chanced to see ten of them there. Shi killed the lions with arrows and picked up their bodies carrying them back to his stone house. His house was dripping with water so he requested that his servants proceed to dry it. Then he began to try to eat the bodies of the ten lions. It was only then he realized that these were in fact ten lions made of stone. Try to explain the riddle.
Problems with this argument

This argument sounds compelling until you realize that what is being claimed is that you can't understand a Chinese sentence based on its sound alone. This means that not only is it impossible to understand phonetically-written Chinese, it is impossible to understand spoken Chinese (which, like phonetically-written Chinese, doesn't have any characters to help disambiguate similar-sounding words). Since a billion people speak Mandarin Chinese every day, there must be a problem with this argument!

There are a few. First of all, I wrote the poem phonetically ignoring the five Chinese tones. Like many languages, Chinese uses intonation phonetically -- an 'i' with a rising tone is different from an 'i' with a falling tone. Writing a tonal language without tones is like writing English without vowels -- much harder to read. Similarly, the phonetic writing above does not have any breaks between words, making it much harder to read (imaginewritingEnglishwithoutspacesbetweenwords). True, written Chinese doesn't mark word boundaries, but then it has all the extra information encoded in the characters to help with any ambiguity.

Second, this poem uses very archaic Chinese (different vocabulary and different grammar than modern Mandarin). It's not clear how many people would understand the poem spoken aloud. Wikipedia gives a nice translation of the poem into modern Mandarin, which involves many different sounds, not just 'shi'.

The most important problem is that there actually is a perfectly good phonetic system for writing Chinese. Actually, there are several, but the most common is pinyin. People can and do write entire texts in pinyin.

Why care? 

Why go to the effort of debunking this myth? This often comes up in arguments over whether the Chinese should adopt a new writing system, but that's not really my concern. Very often, there is a tendency to believe that different cultures and languages are much more different from one another than they are. One hears about strange aspects of other languages without even pausing to think about the fact that your own language has many of those same features. The writing systems of English and Chinese are actually alike in many ways (both are partially phonetic and partially semantic -- a topic for a different post). I can only speak for myself, but the more I learn about a given language, usually the less foreign it seems. Which is a fact worth thinking about.