Games with Words

Nature Magazine endorses Obama (but not because of science policy)

Posted by josh on Wednesday, October 29, 2008

Nature Magazine's latest issue, just published online, endorses Obama. Interestingly, this is not because of "any specific pledge to fund some particular agency or initiative at a certain level." Instead, the editorial emphasizes the contrast in the ways the two candidates reach decisions:

On a range of topics, science included, Obama has surrounded himself with a wider and more able cadre of advisers than McCain. This is not a panacea. Some of the policies Obama supports -- continued subsidies for corn ethanol, for example -- seem misguided. The advice of experts is all the more valuable when it is diverse: 'groupthink' is a problem in any job. Obama seems to understands [sic] this. He tends to seek a range of opinions and analyses to ensure that his opinion, when reached, has been well considered and exposed to alternatives. He also exhibits pragmatism -- for example in his proposals for health-care reform -- that suggests a keen sense for the tests reality can bring to bear on policy.

Some will find strengths in McCain that they value more highly than the commitment to reasoned assessments that appeals in Obama. But all the signs are that the former seeks a narrower range of advice. Equally worrying is that he fails to educate himself on crucial matters; hte attitude he has taken to economic policy over many years is at issue here. Either as a result of poor advice, or of advice inadequately considered, he frequently makes decisions that seem capricious or erratic.

The power of because

Posted by josh on Tuesday, October 28, 2008

To ask for a dime just outside a telephone booth is less than to ask for a dime for no apparent reason in the middle of the street.

-Penelope Brown & Stephen Levinson, Politeness

The opening quote seems to be true. It raises the question of why, though. An economist might say a gift of 10 cents is a gift of 10 cents. You are short 10 cents no matter what the requestee's reason. So why should it matter?

The power of because?
Empirically, in a well-known experiment, Ellen Langer and colleagues showed that 95% of people standing in line to use a copy machine were willing to let another cut in line as long as the cutter offered a reason, even if that reason was inane (e.g. "because I have to make copies.")

The explanation given by Langer and colleagues was that people are primed to do defer to somebody who provides a reason. Thus, the word "because" essentially in and of itself can manipulate others. This not only causes us to give money to people who need it to make a phone call, but to simply give money to anybody who gives a reason.

I haven't been able to find the original research paper -- it seems to have perhaps been reported in a book, not in a published article -- so I don't know for sure exactly what conditions were used. However, none of the media reports I have read (such as this one) mention the perhaps the most important control: a condition in which the cutter gives no excuse and does not use the word "because."

What are other possible explanations?
Other possible explanations are that people are simply reluctant to say 'no,' especially if the request is made in earnest.

There are a couple reasons this could be true. People might be pushovers. They might also simply have been taught to be very polite.

Something that strikes me more likely is that most people avoid unnecessary confrontation. Confrontation is always risky. It can escalate into a situation where somebody gets hurt. Certainly, violent confrontations have been started over less than conflicting desires to use the same copier.

Speculation

None of these speculations, however, explain the opening quote. Perhaps there is an answer out there, and if anybody has come across it, please comment away.

A vote for McCain is a vote against science

Posted by josh on Monday, October 27, 2008

Readers of this blog know that I have been skeptical of John McCain's support for science. Although he has said he supports increasing science funding, he appears to consider recent science funding budgets that have not kept pace with inflation to be "increases." He has also since called for a discretionary spending freeze.

In recent years vocally anti-science elements have hijacked the science policies of the Republican party -- a party that actually has a strong history of supporting science -- so the question has been where McCain stands, or at least which votes he cares about most. The jury is still out on McCain, but Palin just publicly blasted basic science research as wasteful government spending.

The project that she singled out, incidentally, appears to be research that could eventually lead to new treatments of Autism. Ironically, Palin brought up this "wasteful" research as a program that could be cut in order to fully fund the Individuals with Disabilities Education Act.

Become a Phrase Detective: A new, massive Internet-based language project

Posted by josh on Monday, October 27, 2008

A typical speech or text does not consist of a random set of unrelated sentences. Generally, the author (or speaker) starts talking about one thing and continues talking about it for a while. While this tends to be true, there is typically nothing in the text that guarantees it:

This is my brother John. He is very tall. He graduated from high school last year.

We usually assume this is a story about a single person, who is tall, a recent high school graduate, named John, and who is brother of the speaker. But it could very well have been about three different people. Although humans are very good at telling which part of a story relates to which other part, it turns out to be very difficult to explain how we know. We just do.

This is a challenge both to psychologists like myself, as well as to people who try to design computer programs that can analyze text (whether for the purposes of machine translation, text summarization, or any other application).

The materials for research

A group at the University of Essex put together an entertaining new Web game called Phrase Detectives to help develop new materials for cutting-edge research into this basic problem of language. Their project is similar to my ongoing Dax Study, except that theirs is not so much an experiment as a method for developing the stimuli.

Phrase Detectives is set up as a competition between users, and the results is an entertaining game that you can participate in more or less as you choose. Other than its origins, it looks a great deal like many other Web games. The game speaks for itself and I recommend that you check it out.

What's the point?

Their Wiki provides some useful details as to the purpose of this project, but as it is geared more towards researchers than the general public, it could probably use some translation of its own. Here's my attempt at translation:

The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora...

Basically, the goal of Computational Linguistics (and the related field, Natural Language Processing) is to come up with computer algorithms that can "parse" text -- break it up into its component parts and explain how those parts relate to one another. This is like a very sophisticated version of the sentence diagramming you probably did in middle school.

Developing and testing new algorithms requires a log of practice materials ("corpora"). Most importantly, you need to know what the correct parse (sentence diagram) is for each of your practice sentences. In other words, you need "annotated corpora."

...but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more that one million words.

One million words may seem like a lot, but it isn't really. One of the complaints about one of the most famous word frequency corpora (the venerable Francis & Kucera) is that many important words never even appear in it. If you take a random set of 1,000,000 words, very common words like a, and, and the take up a fair chunk of that set.

Also, consider that when a child learns a language, that child hears or reads many, many millions of words. If it takes so many for a human who is genetically programmed to learn language, how long should it take a computer algorithm? (Computers are more advanced than humans in many areas, but in the basic areas of human competency -- vision, language, etc. -- they are still shockingly primitive.)

However, the success of Wikipedia and other projects shows that another approach might be possible: take advantage of the willingness of Web users to collaborate in resource creation. AnaWiki is a recently started project htat iwll develop tools to allow and encourage large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (in the first instance, of a corpus annotated with information about anaphora).

This is, of course, what makes the Web so exciting. It took a number of years for it to become clear that the Web was not just a method of doing the same things we always did but faster and more cheaply, but actually a platform for doing things that had never even been considered before. It has had a deep impact in many areas of life -- cognitive science research being just one.

Human behavior on display in the subway

Posted by josh on Friday, October 24, 2008

Riding Boston's T through Cambridge yesterday, I was reminded of why I love this town. You can learn a lot about a city riding its public transportation (and if the city doesn't have public transportation, then you have learned something, too).

In Russia, for instance, people stare coldly off into space. The blank look can appear hostile to those not accustomed to it, but it's really more representative of how Russians carry themselves in public than representative of what Russians are like more generally (some of the warmest people I know are Russian. They just don't display it on the train). To the extent that people do anything while on the train, they mostly do crossword puzzles (at least in St. Petersburg, where I've spent most of my time).

In Taiwan, reading is rampant. You can see this outside of the subway as well, since there are bookstores everywhere, and they are very popular. This made me feel more at home (I almost always read on the train) than in business-minded Hong Kong, where reading was much less common. Hong Kong is one of my favorite cities, but its decidedly short on bookstores.

This brings me back to my T ride through Cambridge yesterday. The person sitting next to me was reading what was clearly a language textbook, but I couldn't recognize the writing system. It looked vaguely Asian, but I know enough of Japanese, Chinese and Korean to know it wasn't one of those. Eventually, he closed the book and I saw it was a an Akkadian textbook. Akkadian, incidentally, hasn't been spoken in about two thousand years.

That is Cambridge -- and Boston more generally. Many of the people on the train are grading papers, reading scientific articles or studying a language. It's very much a town of academics. (A large percentage of the metro riders also wear Red Sox gear. The two populations are not mutually exclusive.)

Singapore's Science Complex

Posted by josh on Thursday, October 23, 2008

Among developing countries that are investing heavily in science, Singapore (is Singapore still a developing country?) stands out. A recent article in Nature profiles a massive new public/private science complex called "Fusionopolis." This is a physical-sciences counterpart to the existing "Biopolis."

Although the overall spending rate on science is still small by country standards ($880 million/year), it is impressive on a per-capita basis. Currently, it is spending 2.6% of its gross domestic product on science, and plans to increase to 3% by the end of the decade, which would put it ahead of Britain and the United States.

What struck me in the article was that Singapore is very explicit about it's goal, and that isn't knowledge for knowledge's sake. According to Chuan Poh Lim, chairman of A*STAR, Singapore's central agency for encouraging science and technology, Singapore recognizes it can't compete with China or India as a low-cost manufacturer. "In terms of 'cheaper and faster', we will lose out. We need a framework for innovation."

The ultimate goal is to build an economy with a stronger base in intellectual property, generating both new products and also patent royalties.

Iranian politician moonlights as scientific plagiarist

Posted by josh on Wednesday, October 22, 2008

It appears that one of the plagiarists caught by Harold Garner's Deja Vu web database, "author" of a paper, 85% of which was stitched together from five papers by other researchers, is Massoumeh Ebtekar, former spokeswoman for the militant students that held 52 Americans hostage in the US Embassy in Tehran during the Carter administration, former vice-president under Mohammad Khatami, and current member of the Tehran City Council.

Nature, my source for this news, reports that she has blamed this on the student who helped her with the manuscript. This would seem to indicate that the student wrote most or all of the paper, despite not being listed as an author...which is a different kid of plagiarism, if one more widely accepted in academia.

As ice melts, oceanography freezes

Posted by josh on Wednesday, October 22, 2008

Nature reports that the US academic oceanographic fleet is scaling back operations due to a combination of budget freezes and rising fuel costs. This means that at least one of its 23 ships will sit out 2009, and two others will take extended holidays.

Even so, more cuts will probably necessary.

This is of course on top of the budgetary crisis at one of the USA's premier physics facitilities, Fermi Lab.

A rash of scientific plagiarism?

Posted by josh on Friday, October 17, 2008

Nature reports that Harold Garner of the University of Texas Southwestern Medical Center in Dallas has been scouring the medical literature using an automated text-matching software package to catch plagiarized articles.

A surprising number have been found. 181 papers have been classified as duplicates, sharing 85% of their text, on average, with a previous paper. One quarter of these are nearly 100% identical to a previous publication.

While it is troubling that anybody would be so brazen, the fact that they have gotten away with it so far says something: there are a lot of journals. And a lot of papers. For a plagiarist to be successful, it must be the case that neither the editor nor any of the referees have read the original article -- this despite the fact that referees are typically chosen because they are experts in the field the article addresses.

That, I think, is the big news: that it is possible to plagiarize so blatantly.

Incidentally, the Nature news brief suggests that the confirmed plagiarism is usually carried out in obscure journals. This means that the plagiarists are gaining relatively little for their effort, and the original authors are losing little.

That said

Garner's project has apparently identified 75,000 abstracts that seem highly similar. It's hard to tell what that means, so we'll have to wait for the full report.

An abstract is about 200 words long. PsychInfo currently lists 10,098 which contain the phrase "working memory." One would assume that, even if all of them are examples of independent work, many are highly similar just by random chance. So I hope to find out more about how "highly similar" is being operationalized in this project.

While I suspect that plagiarism is not a huge problem, I still think it is fantastic that people are attacking it with these modern tools. I think we will be seeing a lot more of this type of work. (Actually, come to think of it, a professor I had in 2002 actually used an automated plagiarism-catching software program to screen student homework, so this has been around for a while.)

Learning verbs is hard

Posted by josh on Wednesday, October 15, 2008

I am getting ready to write up some of my recent research on verb learning -- a project for which the Dax Study and the Word Sense experiments are both follow-ups. This means a spate of reading. As I come across interesting papers, I'll be sharing them here.

A PsychInfo search turned up a very intriguing paper by Kerstin Meints, Kim Plunkett and Paul Harris published earlier this year in Language and Cognitive Processes on verb learning.

What is tricky about verbs?

Verbs are very difficult words -- both for linguists to describe and for babies to learn. In this post, I'll focus on the second part.

First, unlike nouns, verbs refer to something you can't see. You can point to a ball, but it is much harder to point to thinking. Even action verbs (break, jump) are typically used when the action has already been completed and is no longer visible.

To make matters worse, Lera Boroditsky and Deidre Gentner have noted that verbs are more variable across languages than are nouns, which means either that there can be fewer innate constraints in acquisition or that there simply are fewer such constraints.

The tricky aspect of verbs that Meints, Plunkett and Harris focus on, though, is the way in which verbs generalize. For instance, to use the verb eat correctly, you have to use it to describe the actions of many different eaters (horse, cow, Paul, Sally, George) as well as many different objects which are eaten (sandwich, apple, old boot).

Nouns, of course, have to be generalized. Not all apples look the same. But then, neither do all acts of eating (politely, messily, with a fork).

Using a verb to its fullest

Meints, Plunkett & Harris noted that for any given verb, there are stereotypical direct objects (John ate the cookie) and unusual direct objects (John ate the bush).

One might imagine that children start off expecting verbs to apply only to stereotypical events, because that's what they actually hear their parents talk about (Don't eat the cookie!). Only later do they learn that the verb extends much more broadly to events which they have never witnessed or discussed (Don't eat the bush!).

This seems like a very plausible learning story, very similar to Tomasello's Verb Islands hypothesis, though I'm not sure if it's one he explicitly endorses (hopefully, I'll be reading more Tomasello shortly).

Alternatively, children might start by assuming a verb can apply across the board. That is, children treat verbs categorically, more in line with the algebraic theories that most linguists seem to endorse but which a number psychologists find implausible.

The data

The researchers used a tried-and-true method to test the language abilities of young kids: present the kids with two videos side by side (for instance, of John eating a cookie and of Alfred sweeping a floor) and say "look at the eating." If the child knows what "eating" means, they should look at John and not Alfred.

The key manipulation was that the eating event was either stereotypical (John eating a cookie) or unusual (John eating a bush). Note, of course, that a number of verbs were tested, not just eating.

15 month olds failed at the task. They didn't appear to know any of the verbs. 18 month olds, however, looked at the correct video regardless of the typicality of the event. They understood verbs to apply across the board. 24 month olds, just slightly older, however, looked at the correct video for the typical event but not for the atypical. By 3 years old, though, the kids were back to looking at the correct event regardless of typicality (though they reportedly giggled at the atypical events).

What does that mean?

The difficulty with cognitive science is not so much in creating experiments, but in interpreting them. This one is difficult to interpret, though potentially very important, which is why I called it "intriguing."

The results minimally mean that by the time kids can perform this task -- look at an event described by a verb when that verb is mentioned -- they are not immediately sensitive to the typicality of the event. Later, they become sensitive.

At least two issues constrain interpretation. First, we don't really know that the 18 month old infants were not sensitive to typicality, only that they didn't show it in this experiment. Second, we don't know whether the 24 month olds thought the event of John eating a bush could not be described by the verb eat (which would be wild!) or if they simply found the video such an implausible instance of the verb eat that they paid equal attention to the other video, just in case an a more typical example of eating showed up there.

In conclusion

So which theory of language learning does this result support? I honestly am not sure. If it had shown a growing, expanding interpretation of verb meaning, I might have said it supported something like the Verb Island hypothesis. If children started out with an expansive understanding of the verb and stuck with it, I might say it endorsed a more classic linguistic point of view.

The actual results are some combination of the two, and very hard to understand. (I don't want to try characterizing the authors' interpretation, because I'm still not sure I completely understand it. I recommend reading the paper instead.)

-------
Kerstin Meints, Kim Plunkett, Paul Harris (2008). Eating apples and houseplants: Typicality constraints on thematic roles in early verb learning Language and Cognitive Processes, 23 (3), 434-463 DOI: 10.1080/01690960701726232

Text messages for elephants

Posted by josh on Tuesday, October 14, 2008

It has been widely noted that even in areas too remote or poor to have regular telephone service, cell phones and text messaging are ubiquitous. Now, even elephants send text messages.

Modern conservation
As reported by the New York Times, a protected elephant has been fitted with a collar that sends a text message whenever the elephant nears local farms. This was done after several elephants on a local reservation had to be shot to protect the area farmers. Now, when the elephant wanders from its range, rangers arrive to scare it back.

The article is worth reading in its entirety. It's great that a smart method has been found to help wild animals and human civilization coexist.

What struck me, though, was the note that elephants learn from one another, and deterring one elephant from raiding farms can help stop other elephants as well. If anybody knows more about this, I'd be very interested in hearing what is known about elephant social learning.

Psychic word learning. No, seriously.

Posted by josh on Sunday, October 12, 2008

When I started to read chapter books, I frequently ran across words I didn't know. Being too lazy to look them up in a dictionary, I just made up definitions for the words and continued along.

Children learn tens of thousands of words, and since they don't generally carry around a pocket dictionary to look up every new word, they are frequently forced to do what I did: make a good guess and run with it. (Contrary to popular belief, adults rarely define words for children, and children don't necessarily ask for definitions, either.)

While we tend to focus on children's acquisition of language, the truth is that even adults must deal with new vocabulary. For one thing, the English language contains many more words than any one person can know. Estimates vary considerably, but you typically see estimates of about 500,000 words in English and only 60,000 in the adult vocabulary. So the chances of coming across a new word are pretty good, especially if you read the New Yorker.

Similarly, new words pop into the language all the time, such as to Bork somebody or to fax a document. (Those are the classic and now somewhat rusty examples. A more modern one is to Swift-Boat a candidate.)

While we can sometimes look up definitions, we often trust our instincts to define these new words for us. How exactly this happens is still not completely understood. Some new uses of old words are probably adopted as a type of metaphor (not that we know how metaphor works, either). Others may be related to derivational morphology (the method of creating a new word from an old word by adding an affix; e.g., happy -> happiness, employ -> employee).

I recently posted a new experiment, the results of which will hopefully help us better understand this process. If you have 5 minutes, please participate (click here). As always, when this study is done, I will post the results here and on the main website.

Science's Call to Arms

Posted by josh on Friday, October 10, 2008

In case anyone was wondering, I am far from alone in my call for a new science policy in the coming administration. It is the topic of the editorial in the latest issue of Science Magazine America's premier scientific journal:

For the past 7 years, the United States has had a presidential administration where science has had little place at the table. We have had a president opposed to embryonic stem cell research and in favor of teaching intelligent design. We have had an administration that at times has suppressed, rewritten, ignored, or abused scientific research. At a time when scientific opportunity has never been greater, we have had five straight years of inadequate increases for U.S. research agencies, which for some like the National Institutes of Health (NIH) means decreases after inflation.

All of this has been devastating for the scientific community; has undermined the future of our economy, which depends on innovation; and has slowed progress toward better health and greater longevity for people around the world.

Dr. Porter, the editorialist, goes on to ask

So if you are a U.S. scientist, what should you do now?

He offers a number of ideas, most of which are probably not practical for a graduate student like myself ("volunteer to advise ... candidates on science matters and issues.").

The one that is most practical and which anybody can do is to promote ScienceDebate2008.com. He acknowledges that the program's goal -- a presidential debate dedicated to science -- will not be accomplished in 2008, bu the hope is to signal to the media and to politicians that people care about science and science policy.

And who knows? Maybe there will be a science debate is 2012?

Word Sense: A new experiment from the Cognition & Language Lab

Posted by josh on Wednesday, October 08, 2008

The last several months have been spent analyzing data and writing papers. Now that one paper has been published, two are well into the review process, and another is mostly written, it is time at the Cognition and Language lab to start actively collecting data again.

I just posted our first major new experiment since last winter. It is called "Word Sense," and it takes about 5 minutes to complete. It can be taken by anybody of any age and of any language background.

As always, you can view a description of the study at the end. You also will get a summary of your own results.

I'll be writing more about this research project in future posts.

Does literacy still matter?

Posted by josh on Wednesday, October 08, 2008

In an intriguing recent article in Science Magazine (subscription required), Douglas Oard of the University of Maryland asks what the cultural consequences of better speech recognition software will be.

He notes that the reason literacy is so important is the "emphemeral nature of speech." Even after audio recording became cheap, print was still necessary because it is easier to store and easier to skim and search.

However, new technology is rapidly shifting the balance, as hardware space becomes cheap and computerized searching of audio material becomes effective. Perhaps the costs of learning to read will soon cease to be justified.

Really?

Oard recognizes there will be resistance to the idea (note that he doesn't actually endorse eliminating reading and writing), but he cautions that we should think with our heads, not our biases:

Our parents complained that our generation relied on calculators rather than learning arithmetic... In Plato's Phaedrus, the Pharaoh Thamus says of writing, "If men learn this, it will implant forgetfulness in their souls: They will cease to exercise memory because they rely on that which is written." ... Our generation will unlock the full potential of the spoken word, but it may fall to our children, and to their children, to learn how best to use that gift.

---------
D. W. Oard (2008). SOCIAL SCIENCE: Unlocking the Potential of the Spoken Word Science, 321 (5897), 1787-1788 DOI: 10.1126/science.1157353

Field of Science