Games with Words: verbs

Showing posts with label verbs. Show all posts

Results (Round 1): Crowdsourcing the Structure of Meaning & Thought

Posted by GamesWithWords on Tuesday, December 17, 2013

Language is a device for moving a thought from one person's head into another's. This means to have any real understanding of language, we also need to understand thought. This is what makes work on language exciting. It is also what makes it hard.

With the help of over 1,500 Citizen Scientists working through our VerbCorner project, we have been making rapid progress.

Grammar, Meaning, & Thought

You can say Albert hit the vase and Albert hit at the vase. You can say Albert broke the vase but you can't say Albert broke at the vase. You can say Albert sent a book to the boarder [a person staying at a guest house] or Albert sent a book to the border [the line between two countries], but while you can say Albert sent the boarder a book, you can't say Albert sent the border a book. And while you say Albert frightened Beatrice -- where Beatrice, the person experiencing the emotion, is the object of the verb -- you must say Beatrice feared Albert -- where Beatrice, the person experiencing the emotion, is now the subject.

How do you know which verb gets used which way? One possibility is that it is random, and this is just one of those things you must learn about your language, just like you have to learn that the animal in the picture on the left is called a "dog" and not a "perro", "xiaogou," or "sobaka." This might explain why it's hard to learn language -- so hard that non-human animals and machines can't do it. In fact, it results in a learning problem so difficult that many researchers believe it would be impossible, even for humans (see especially work on Baker's Paradox).

Many researchers have suspected that there are patterns in terms of which verbs can get used in which ways, explaining the structure of language and how language learning is possible, as well as shedding light on the structure of thought itself. For instance, the difference (it is argued) between Albert hit the vase and Albert hit at the vase is that the latter sentence means that Albert hit the vase ineffectively. You can't say Albert broke at the vase because you can't ineffectively break something: It is either broken or not. The reason you can't say Albert sent the border a book is that this construction means that the border owns the book, which a border can't do -- borders aren't people and can't own anything -- but a boarder can. The difference between Albert frightened Beatrice and Beatrice feared Albert is that the former describes an event that happened in a particular time and place (compare Albert frightened Beatrice yesterday in the kitchen with Beatrice feared Albert yesterday in the kitchen).

When researchers look at the aspects of meaning that matter for grammar across different languages, many of the same aspects pop up over and over again. Does the verb describe something changing (break vs. hit)? Does it describe something only people can do (own, know, believe vs. exist, break, roll)? Does it describe an event or a state (frighten vs. fear)? This is too suspicious of a pattern to be accidental. Researchers like Steven Pinker have argued that language cares about these aspects of meaning because these are basic distinctions our brain makes when we think and reason about the world (see Stuff of Thought). Thus, the structure of language gives us insight into the structure of thought.

The Question

The theory is very compelling and is exciting if true, but there are good reasons to be skeptical. The biggest one is that there simply isn't that much evidence one way or another. Although a few grammatical constructions have been studied in detail (in recent years, this work has been spearheaded by Ben Ambridge of the University of Liverpool), the vast majority have not been systematically studied, even in English. Although evidence so far suggests that which verbs go in which grammatical constructions is driven primarily or entirely by meaning, skeptics have argued that is because researchers so far have focused on exactly those parts of language that are systematic, and that if we looked at the whole picture, we would see that things are not so neat and tidy.

The problem is that no single researcher -- nor even an entire laboratory -- can possibly investigate the whole picture. Checking every verb in every grammatical construction (e.g., noun verb noun vs. noun verb at noun, etc.) for every aspect of meaning would take one person the rest of her life.

CrowdSourcing the Answer

Last May, VerbCorner was launched to solve this problem. For the first round of the project, we posted questions about 641 verbs and six different aspects of meaning. By October 18th, 1,513 volunteers had provided 117,584 judgments, which works out to 3-4 people per sentence per aspect of meaning. That was enough data to start analyzing.

As predicted, there is a great deal of systematicity in the relationship between meaning and grammar (for details on the analysis, see the next section). These results suggest that the relationship between grammar and meaning may indeed be very systematic, helping to explain how language is learnable at all. It also gives us some confidence in the broad project of using language as a window into how the brain thinks and reasons about the world. This is important, because the mind is not easy to study, and if we can leverage what we know about language, we will have learned a great deal. As we test more verbs and more aspects of meaning -- I recently added an additional aspect of meaning and several hundred new verbs -- that window will be come clearer and clearer.

Unless, of course, it turns out that not all of language is so systematic. While our data so far represent a significant proportion of all research to date, it's only a tiny fraction of English. That is what makes research on language so hard: there is so much of it, and it is incredibly complex. But with the support of our volunteer Citizen Scientists, I am confident that we will be able to finish the project and launch a new phase of the study of language.

That brings up one additional aspect of the results: It shows that this project is possible. Citizen Science is rare in the study of the mind, and many of my colleagues doubted that amateurs could provide reliable results. In fact, by the standard measures of reliability, the information our volunteers contributed is very reliable.

Of course, checking for a systematic relationship between grammar and meaning is only the first step. We'd also liked to understanding which verbs and grammatical constructions have which aspects of meaning and why, and leverage this knowledge into understanding more about the nature of thought. Right now, we still don't have enough data to have exciting new conclusions (for exciting old conclusions, see Pinker's Stuff of Thought). I expect I'll have more to say about that after we complete the next phase of data collection.

Details of the Analysis

Here is how we did the analyses. If meaning determines which grammatical constructions a given verb can appear in, then you would expect that all the verbs that appear in the same set of frames should be the same in terms of the core aspects of meaning discussed above. So if one of those verbs describes, for instance, physical contact, then all of them should.

Helpfully, the VerbNet project -- which was built on earlier work by Beth Levin -- has already classified over 6,000 English verbs according to which grammatical constructions they can appear in. The 641 verbs posted in the first round of the VerbCorner project consisted of all the verbs from 11 of these classes.

So is it the case that in a given class, all the verbs describe physical contact or all of them do not? One additional complication is that, as I described above, the grammatical construction itself can change the meaning. So what I did was count what percentage of verbs from the same class have the same value for a given aspect of meaning for each grammatical construction, and then I averaged over those constructions.

The "Explode on Contact" task in VerbCorner asked people to determine whether a given sentence (e.g., Albert hugged Beatrice) described contact between different people or things. Were the results for a given verb class and a given grammatical construction? Several volunteers checked each sentence. If there was disagreement among the volunteers, I used whatever answer the majority had chosen.

This graph shows the degree of consistency by verb class (the classes are numbered according to their VerbNet number), with 100% being maximum consistency. You can see that all eleven classes are very close to 100%. Obviously, exactly 100% would be more impressive, but that's extremely rare to see when working with human judgments, simply because people make mistakes. We addressed this in part by having several people check each sentence, but there are so many sentences (around 5,000), that simply by bad luck sometimes several people will all make a mistake on the same sentence. So this graph looks as close to 100% as one could reasonably expect. As we get more data, it should get clearer.

Results were similar for other tasks. Another one looked at whether the sentence described someone applying force (pushing, shoving, etc.) to something or someone else:

Maybe everything just looks very consistent? We actually had a check for that. One of the tasks measures whether the sentence describes something that is good, bad, or neither. These is no evidence that this aspect of meaning matters for grammar (again, the hypothesis is not that every aspect of meaning matters -- only certain ones that are particularly important for structuring thought are expected to matter). And, indeed, we see much less consistency:

Notice that there is still some consistency, however. This seems to be mostly because most sentences describe something that is neither good nor bad, so there is a fair amount of essentially accidental consistency within each verb class. Nonetheless, this is far less consistency that what we saw for the other five aspects of meaning studied.

VerbCorner: A Citizen Science project to find out what verbs mean

Posted by GamesWithWords on Wednesday, May 22, 2013

Earlier this week, I blogged about our new VerbCorner project. At the end, I promised that there would be more info forthcoming about why we are doing this project, about its aims and expected outcomes, why it's necessary, etc. Here's the first installment in that series.

Computers and language

I just dictated the following note to Siri

Many of our best computer systems treat words as essentially meaningless symbols that need to be moved around.

Here's what she wrote

Many of our best computer system street words is essentially meaningless symbols that need to be moved around.

I rest my case.

The problem of meaning.

I don't know for sure how Siri works, but her mistake is emblematic of how much language software works. Computer systems treat and Computer system street sound approximately the same, but that's not something most humans would notice because the first interpretation makes sense and the second one doesn't.

Decades of research shows that human language comprehension is heavily guided by plausibility: when there are two possible interpretations of what you just heard, go for the one that makes sense. This happens speech recognition like in the example above, and it plays a key role in understanding ambiguous words. If you want to throw Google Translate for a look, give it the following:

John was already in his swimsuit as we reached the watering hole. "I hope the tire swing is still there," John said as he headed to the bank.

Although the most plausible interpretation of bank here is side of a river, Google Translate will translate it into the word for "financial institution" in whatever language you are translating into, because that's the most common meaning of the English work bank.

So what's the problem?

I assume that this limitation is not lost on the people at Google or at Apple. And, in fact, there are computer systems that try to incorporate meaning. The problem there is not so much the computer science as the linguistic science.** Dictionaries notwithstanding, scientists really do not know very much about what words mean, and it is hard to program the computer to know what the word means when you actually do not know.

(Dictionaries are useful, but as an exercise, pick* definition from a dictionary and come up with a counterexample. It is not hard.)

One of the limitations is scope. Language is huge. There are a lot of words. So scientists will work on the meanings of a small number of words. This is helpful, but a computer that only knows a few words is pretty limited. We want to know the meanings of all words.

Solving the problem

We've launched a new section of the website, VerbCorner. There, you can answer questions about what verbs mean. Rather than try to work out the meaning of a word all at once, we have broken up the problem in a series of different questions, each of which tries to pinpoint a specific component of meaning. Of course, there are many nuances to meaning, but research has shown that certain aspects are more important that others, and we will be focusing on those.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

----
*Dragon Dictate originally transcribed this as "pickled", which I did not catch on proofreading. More evidence that we need computer programs that understand what words mean.
**Dragon Dictate make spaghetti out of this sentence, too.

Citizen Science at GamesWithWords.org: The VerbCorner Project

Posted by GamesWithWords on Tuesday, May 21, 2013

What do verbs mean? We'd like to know. For that reason, we just launched VerbCorner, a massive, crowd-sourced investigation into the meanings of verbs.

Why do we need this project? Why not just look up what verbs mean in a dictionary? While dictionaries are enormously useful (I think I own something like 15), they are far from perfect. For one thing, it's usually very easy to find counter-examples even for what seem like straight-forward definitions. Take the following:

Bachelor: An unmarried man.

So is the Pope a bachelor? Is Neil Patrick Harris? How about a married man from a country in which men are allowed multiple wives?

At VerbCorner, rather than trying to work out the whole definition at once, we have broken meaning into many different components. At the site, you will find several different tasks. In each task, you will try to determine whether a particular verb has a particular component of meaning.

If you are interested in what words mean and would like to help with this project, sign up for an account at http://gameswithwords.org/VerbCorner/. Participation can be anonymous, but we are happy to recognize significant contributions from anyone who wishes it.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

Findings: Linguistic Universals in Pronoun Resolution - Episode II

Posted by GamesWithWords on Monday, November 19, 2012

A new paper, based on data collected through GamesWithWords.org, is now in press (click here for the accepted draft). Below is an overview of the paper.

Many of the experiments at GamesWithWords.org have to do with pronouns. I find pronouns interesting because, unlike many other words, the meaning of a pronoun is almost entirely dependent on context. So while "Jane Austen" refers to Jane Austen no matter who says it or when, "I" refers to a different person, depending mostly on who says it (but not entirely: an actor playing a part uses "I" to refer not to himself but to the character he's playing). Things get even hairier when we start looking at other pronouns like "he" and "she". This means that pronouns are a good laboratory animal for investigating how people use context to help interpret language.

Mice make lousy laboratory animals for studying the role of context in language.

Pronouns are better.

I have spent a lot of time looking at one particular contextual effect, originally discovered by Garvey and Caramazza in the mid-70s:

(1) Sally frightens Mary because she...
(2) Sally loves Mary because she...

Although the pronoun is ambiguous, most people guess that she refers to Sally in (1) but Mary in (2). That is, the verb used (frightens, loves) seems to affect pronoun resolution. Replace "frightens" and "loves" with other verbs, and what happens to the pronoun depends on the verb: some verbs lead to subject resolutions like frightens, some to object resolutions like loves, and some leave people unsure (that is, they think that either interpretation of the pronoun is equally reasonable).

The question is why. One possibility is that this is some idiosyncratic fact about the verb. Just as you learn that the past tense of walk is walked but the past tense of run is ran, you learn that some verbs lead you to resolve pronouns to the verbs' subject and some the verbs' object (and some verbs have no preference). This was what was tentatively suggested in the original Garvey and Caramazza paper.

Does the meaning of the verb matter?

One of the predictions of this account is that there's nothing necessary about the fact that frightens leads to subject resolutions whereas loves leads to object resolutions, just as there is no deep reason that run's past tense is ran. English could have been different.

Many researchers have suspected that the pronoun effects we see are not accidental; the pronoun effects arise from some fundamental aspect of the meanings of frightens and loves. Even Garvey & Caramazza suspected this, but all the hypotheses they considered they were able to rule out. Recently, using data from GamesWithWords.org, we presented some evidence that this is right. Interestingly, while researchers studying pronouns were busy trying to come up with some theory of verb meaning that would explain the pronoun effects, many semanticists were independently busy trying to explain verb meaning for entirely different reasons. Usually, they are interested in explaining things like verb alternations. So, for instance, they might notice that verbs for which the subject experiences an emotion about the object:

(3) Mary likes/loves/hates/fears John.

can take "that" complements:

(4) Mary likes/loves/hates/fears that John climbs mountains.

However, verbs for which the object experiences an emotion caused by the subject do not:

(5) Mary pleases/delights/angers/frightens John.
(6) *Mary pleases/delights/angers/frightens that John climbs mountains.

[The asterisk means that the sentence is ill-formed in English.]

Linguists working on these problems have put together lists of verbs, all of which have similar meanings and which can be used in the same way. (VerbNet is the most comprehensive of these.) Notice that in this particular work, "please" and "frighten" end up in the same group as each other and a different group from "like" and "fear" are in a different one: Even though "frighten" and "fear" are similar in terms of the emotion they describe, they have a very different structure in terms of who -- the subject or the object -- feels the emotion.

We took one such list of verb classes and showed that it explained the pronoun effect quite well: Verbs that were in the same meaning class had the same pronoun effect. This suggests that meaning is what is driving the pronoun effect.

Or does it?

If the pronoun effect is driven by the meaning of a verb, then it shouldn't matter what language that verb is in. If you have two verbs in two languages with the same meaning, they should both show the same pronoun effect.

We aren't the first people to have thought of this. As early as 1983, Brown and Fish compared English and Mandarin. The most comprehensive study so far is probably Goikoetxea, Pascual and Ancha's mammoth study of Spanish verbs. The problem was determining identifying cross-linguistic synonyms. Does the Spanish word asustar mean frighten, scare, or terrify?

Is this orangutan scared, frightened or terrified? Does it matter?

Once we showed that frighten, scare and terrify all have the same pronoun effect in English, the problem disappeared. It no longer mattered what the exact translation of asustar or any other word was: Given that entire classes of verbs in English have the same pronoun effect, all we needed to do was find verbs in other languages that fit into the same class.

We focused on transitive verbs of emotion. These are the two classes already introduced: those where the subject experiences the emotion (like/love/hate/fear) and those where the object does (please/delight/anger/frighten) (note that there are quite a few of both types of verbs). We collected new data in Japanese, Mandarin and Russian (the Japanese and Russian studies were run at GamesWithWords.org and/or its predecessor, CogLangLab.org) and re-analyzed published data from English, Dutch, Italian, Spanish, and Finnish.

Results for English verbs (above). "Experiencer-Subject" verbs are the ones like "fear" and "Experiencer-Object" are the ones like "frighten". You can see that people were consistently more likely to think that the pronoun in sentences like (1-2) referred to the subject of Experiencer-Object verbs than Experiencer-Subject verbs.

The results are the same for Mandarin (above). There aren't as many dots because we didn't test as many of the verbs in Mandarin, but the pattern is striking.

The Dutch results (above). The pattern is again the same. Again, Dutch has more of these verb, but the study we re-analyzed had only tested a few of them.

You can read the paper and see the rest of the graphs here. In the future, we would like to test more different kinds of verbs and more languages, but the results so far are striking, and suggest that the pronoun effect is caused by what verbs mean, not some idiosyncratic grammatical feature of the language. There is still a lot to be worked out, though. For instance, we're now pretty sure that some component of meaning is relevant to the pronoun effect, but which component and why?

------------
Hartshorne, J., and Snedeker, J. (2012). Verb argument structure predicts implicit causality: The advantages of finer-grained semantics Language and Cognitive Processes, 1-35 DOI: 10.1080/01690965.2012.689305

Goikoetxea, E., Pascual, G., and Acha, J. (2008). Normative study of the implicit causality of 100 interpersonal verbs in Spanish Behavior Research Methods, 40 (3), 760-772 DOI: 10.3758/BRM.40.3.760

Garvery, C., and Caramazza, A. (1974). Implicit causality in verbs Linguistic Inquiry, 5 (3), 459-464

Roger Brown and Deborah Fish (1983). Are there universal schemas of psychological causality? Archives de Psychologie, 51, 145-153

Boston University Conference on Language Development: Day 2

Posted by GamesWithWords on Tuesday, November 06, 2012

This year marks my 7th straight BUCLD, the major yearly language acquisition conference. See previous posts for my notes on Day 1 and Day 3.

Verbing nouns

Many if not all English nouns can be turned into verbs. The verb's meaning is related to the noun, but not always in the same way. Consider "John milked the cow" and "John watered the garden". In the first face, John extracts a liquid from the cow; in the second, he adds liquid to the garden.

Maybe this is just something we have to learn in each case, but people seem to have strong intuitions about new verbs. Let's say that there is a substance called "dax" that comes from the dax tree. If I were to dax a tree, am I taking dax out of the tree or adding dax to the tree? Most people think the first definition is right. Now let's say there is something called "blick" which is a seasoning that people often add to soup. If I blick some soup, most people think I'm adding blick to the soup, not taking blick out of the soup. (There are other types of noun-derived verbs as well, but they are a topic for another time.)

These examples suggest a hypothesis: if a noun refers to a substance that usually comes from a specific source, then the derived verb probably refers to the action of extracting that substance. If the noun refers to something that doesn't come from any particular source but is often added to things, then the derived verb refers to that process of adding the substance to something.

Mahesh Srinivasan of UCSD presented joint work with David Barner in which they tested this hypothesis. Probably the most informative of the experiments was one with made-up nouns, much like my "dax" and "blick" examples above. Interestingly, while children were pretty sure that "to blick" meant "put blick on something" (the experiment involved several such nouns, and the children had strong intuitions about all of them), they were much less sure what "to dax" (and similar verbs) meant. Other experiments also showed that young children have more difficulty understanding existing substance-extraction noun-derived verbs (to milk/dust/weed/etc.) than substance-adding noun-derived verbs (to water/paint/butter). And interestingly, English has many more of the latter type of verb than the former.

So, as usual, answer one question leads to another. While they found strong support for their hypothesis about why certain noun-derived verbs have the meanings they do, they also found that children find the one kind of verb easier to learn than the other, which demands an explanation. They explored a few hypotheses. One has to do with the "goal" bias described in previous work by Laura Lakusta and colleagues: generally, when infants watch a video in which an object goes from one location to another, they pay more attention to and remember better the location the object ended up at than the location it came from. Whatever the answer, learning biases -- particularly in young children -- are interesting because they provide clues as to the structure of the mind.

Verb biases in structure priming

One of the talks most-mentioned among the folks I talked to at BUCLD was one on structural priming by Michelle Peter (with Ryan Blything, Caroline Rowland, and Franklin Chang, all of the University of Liverpool). The idea behind structural priming is that using a particular syntactic structure once tends to lead to using it more again in the future (priming). The structure under consideration here was the so-called dative alternation:

(1) Mary gave a book to John.
(2) Mary gave John a book

Although the two sentences mean the same thing (maybe -- that's a long post in itself), notice the difference in word order between (1) and (2). The former is called the "prepositional object" structure, and the second is called the "double object" structure. Some time ago, it was discovered that if people use a given verb (e.g., give) in the prepositional object form once, they are more likely to use that verb in the same form again next time they have to use that verb (and vice versa for the double object form). More recently, it was discovered that using one verb (e.g., give) in the prepositional object form made it more likely to use another verb (e.g., send) in that same form (and again vice versa for the double object form). This suggests that the syntactic form itself is represented in some way that is (at least partially) independent of the verb in question, which is consistent with theories involving relatively abstract grammar.

Or maybe not. This has been highly controversial over the last number of years, with groups of researchers (including the Rowland group) showing evidence of what they call a "lexical boost" -- priming is stronger from the same verb to the same verb, which they take as evidence that grammar is at least partly word-specific. Interestingly, they have now found that children do *not* show the same lexical boost (which, if I remember correctly, has been found by other researchers from the "abstract grammar" camp before, but not by those in the "lexically-specific grammar" camp).

This seems consistent with a theory of grammar on which children start out with relatively general grammatical structures, but as you get older you tend to memorize particularly frequent constructions -- thus, as far as processing goes, grammar becomes increasingly lexically-specific as you get older (though the abstract structures are still around in order to allow for productivity). This is the opposite of the speakers' favored theory, one which grammar becomes more abstract as you get older. They did find some aspects of their data that they thought reflected lexically-specific processing in children; it's complex so I won't discuss it here (I didn't have time to get it all down in my notes and don't want to make a mistake).

There was also a talk by Kyae-Sung Park (collaborator: Bonnie D. Schwartz, both of the University of Hawai'i) on the Korean version of the dative alternation, finding that the more common form is learned earlier by second-language learners of Korean. I was interested in finding out more about the structure of Korean, but I don't know the second-language acquisition research well enough to integrate their main findings into the larger literature.

Other studies

There were many other good talks. The ones I saw included a study by Wang & Mintz, arguing that previous studies that looked at the overlap in the contexts in which different determiners occur in child speech -- which had been used to suggest that young children don't have an abstract grammatical category "determiner" -- were confounded by the small size of the corpora used. If you use a similarly small corpus of adult speech, you'd come to the same conclusion. [The analyses were much cooler and more detailed than this quick overview can get across.]

------------

Lakusta, L., Wagner, L., O'Hearn, K., and Landau, B. (2007). Conceptual Foundations of Spatial Language: Evidence for a Goal Bias in Infants Language Learning and Development, 3 (3), 179-197 DOI: 10.1080/15475440701360168

Findings: What do verbs have to do with pronouns?

Posted by GamesWithWords on Monday, October 15, 2012

A new paper, based on data collected through GamesWithWords.org, is now in press (click here for a pre-print). Below is an overview of this paper.

Unlike a proper name (Jane Austen), a pronoun (she) can refer to a different person just about every time it is uttered. While we occasionally get bogged down in conversation trying to interpret a pronoun (Wait! Who are you talking about?), for the most part we sail through sentences with pronouns, not even noticing the ambiguity.

We have been running a number of studies on pronoun understanding (for some previous posts, see here and here). One line of work looks at a peculiar contextual effect, originally discovered by Garvey and Caramazza in the mid-70s:

(1) Sally frightens Mary because she...
(2) Sally loves Mary because she...

Although the pronoun is ambiguous, most people guess that she refers to Sally in (1) but Mary in (2). That is, the verb used (frightens, loves) seems to affect pronoun resolution.

Causal Verbs

From the beginning, most if not all researchers agreed that this must have something to do with how verbs encode causality: "Sally frightens Mary" suggests that Sally is the cause, which is why you then think that "because she…" refers to Sally, and vice versa for "Sally loves Mary".

The problem was finding a predictive theory: which verbs encode causality which way? A number of theories have been proposed. The first, from Harvard psychologists Roger Brown and Deborah Fish (1983) was that for emotion verbs (frightens, loves), the cause is the person who *isn't* experiencing the emotion -- Sally in (1) and Mary in (2) -- and the subject for all other verbs. This turned out not to be correct. For instance:

(3) Sally blames Mary because she...

Here, most people think "she" is Mary, even though this is not an emotion verb and so the "cause" was supposed to be -- on Brown and Fish's theory -- the subject (Sally).

A number of other proposals have been made, but the data in the literature doesn't clearly support any one (though Rudolph and Forsterling's 1997 theory has been the most popular). In part, the problem was that we had data on a small number of verbs, and as mathematicians like to tell us, you can draw an infinite number of lines a single point (and create many different theories to describe a small amount of data).

Most previous studies had looked at only a few dozen. With the help of visitors to GamesWithWords.org, we collected data on over 1000 verbs. (We weren't the only ones to notice the problem -- after we began our study, Goikoetxea and colleagues published data from 100 verbs in Spanish and Ferstl and colleagues published data from 305 in English). We found that in fact none of the existing theories worked very well.

However, when we took in independently developed theory of verb meaning from linguistics, that actually predicted the results very well. All of the theories tried to divide up verbs into a few classes. Within each class, it was supposed to be all the verbs with either have causes as their subjects (causing people to interpret the pronoun is referring to the subject in sentences like 1-3). Unfortunately, this was rarely the case, as shown in Table 2 of the paper:

A new theory

This was, of course, disappointing. We wanted to understand pronoun interpretation better, but now we understood worse! Luckily, the work did not end there. We turned to a well-developed theory from linguistics about what verbs mean (the work I have described above was developed by psychologists largely independently from linguistics).

The basic idea behind this theory is that the core meaning of verbs is built out of a few basic parts, such as movement, possession, the application of force, and – importantly for us – causality. In practice, nobody goes through the dictionary and determines for every verb, which of these core components it has. This turns out to be prohibitively difficult to do (but stay tuned; a major new project GamesWithWords.org will be focused on just this). But it turns out that when you classify verbs according to the kinds of sentences they can appear in, this seems to give you the same thing: groups of verbs that share these core components meaning (such as causality).

The prediction, then, is that if we look at verbs in the same class according to this theory, all the verbs in that class should encode causality in the same way and thus should affect pronouns in the same way. And that is exactly what we found. This not only furthers our understanding of the phenomenon we were studying, but it is also confirmation of both the idea that verb meaning plays a central role in the phenomenon and is confirmation of the theory from linguistics.

Why so much work on pronouns?

Pronouns are interesting in their own right, but I am primarily interested in them as a case study in ambiguity. Language is incredibly ambiguous, and most of the time we don't even notice it; For instance, it could be that the "she" in (1) refers to Jennifer -- someone not even mentioned in the sentence! -- but you probably did not even consider that possibility. Because we as humans find the problem so easy, it is very hard for us as scientists to have good intuitions about what is going on. This has become particularly salient as we try to explain to computers what language means (that is, program them to process language).

The nice thing about pronouns is that they are a kind of ambiguity is very easy to study, and many good methods have been worked out for assessing their processing. More than many areas of research on ambiguity -- and, I think, more than many areas of psychology that don't involve vision -- I feel that a well worked-out theory of pronoun processing is increasingly within our reach. And that is very exciting.

------

Hartshorne, J., and Snedeker, J. (2012). Verb argument structure predicts implicit causality: The advantages of finer-grained semantics Language and Cognitive Processes, 1-35 DOI: 10.1080/01690965.2012.689305

Brown, R., and Fish, D. (1983). The psychological causality implicit in language Cognition, 14 (3), 237-273 DOI: 10.1016/0010-0277(83)90006-9

Goikoetxea, E., Pascual, G., and Acha, J. (2008). Normative study of the implicit causality of 100 interpersonal verbs in Spanish Behavior Research Methods, 40 (3), 760-772 DOI: 10.3758/BRM.40.3.760

Ferstl, E., Garnham, A., and Manouilidou, C. (2010). Implicit causality bias in English: a corpus of 300 verbs Behavior Research Methods, 43 (1), 124-135 DOI: 10.3758/s13428-010-0023-2

Rudolph, U., and Forsterling, F. (1997). The psychological causality implicit in verbs: A review. Psychological Bulletin, 121 (2), 192-218 DOI: 10.1037//0033-2909.121.2.192

Pilot data

Posted by GamesWithWords on Sunday, March 04, 2012

I am back from a long semi-silence.I have been trying to finish up a number of projects, which gives me less time to write. Speaking of…

One of the focuses of my work is figuring out how children learn the meaning of verbs. This is made more complicated by the fact that we don't actually have completely solid and uncontroversial definitions of verbs. If we don't know what verbs mean, how can we tell when a child has successfully learned them?

I am working on a large scale project to get better definitions of verbs. We are developing many different tasks, each of which gets at one specific aspect of meaning that is thought to be important for at least some verbs. The traditional method would be to have skilled linguists go through verbs one at a time and consult their own intuitions, and in fact a lot of very good work has been done this way (e.g., Jackendoff's Semantic Structures, among many others). However, there are certain advantages to having this work done by a larger number of people who are naïve to linguistic theory, not the least of which is that there are a very large number of verbs, and one person can't get through them all in any reasonable speed. The one disadvantage of working with naïve participants is that they do not understand linguistic

terminology, so you have to find some other way to explain the task.

I have been developing some such tasks, and I could really use some pilot data to see how well they are working. If you have a little time to spare, I would really appreciate the help. There are 3 in particular I am currently working on:

Like Water off a Duck's Back

Person or Thing of the Year

Simon Says Freeze

There is a comments box at the end where you can leave any feedback and mention anything you noticed or which you found confusing. I do need data on all three, so please don't everyone just do the first one.

Fair warning: These tasks take a bit longer than the ones on my website. My guess is that they will take 20-30 minutes each, but that is a wild guess. If somebody does one and wants to leave a comment about how long it took, that would be helpful for me and also for others who might want to do it.

Many thanks.

Learning What Not to Say

Posted by GamesWithWords on Monday, January 17, 2011

A troubling fact about language is that words can be used in more than one way. For instance, I can throw a ball, I can throw a party, and I can throw a party that is also a ball.

These cats are having a ball.

The Causative Alternation

Sometimes the relationship between different uses of a word is completely arbitrary. If there's any relationship between the different meanings of ball, most people don't know it. But sometimes there are straightforward, predictable relationships. For instance, consider:

John broke the vase.
The vase broke.

Mary rolled the ball.
The ball rolled.

This is the famous causative alternation. Some verbs can be used with only a subject (The vase broke. The ball rolled) or with a subject and an object (John broke the vase. Mary rolled the ball). The relationship is highly systematic. When there is both a subject and an object, the subject has done something that changed the object. When there is only a subject, it is the subject that undergoes the change. Not all verbs work this way:

Sally ate some soup.
Some soup ate.

Notice that Some soup ate doesn't mean that some soup was eaten, but rather has to mean nonsensically that it was the soup doing the eating. Some verbs simply have no meaning at all without an object:

Bill threw the ball.
*The ball threw.

In this case, The ball threw doesn't appear to mean anything, nonsensical or otherwise (signified by the *). Try:

*John laughed Bill.
Bill laughed.

Here, laughed can only appear with a subject and no object.

The dative alternation

Another famous alternation is the dative alternation:

John gave a book to Mary.
John gave Mary a book.

Mary rolled the ball to John.
Mary rolled John the ball.

Once again, not all verbs allow this alternation:

John donated a book to the library.
*John donated the library a book.

(Some people actually think John donated the library a book sounds OK. That's all right. There is dialectical variation. But for everyone there are verbs that won't alternate.)

The developmental problem

These alternations present a problem for theory: how do children learn which verbs can be used in which forms? A kid who learns that all verbs that appear with both subjects and objects can appear with only subjects is going to sound funny. But so is the kid who thinks verbs can only take one form.

The trick is learning what not to say

One naive theory is that kids are very conservative. They only use verbs in constructions that they've heard. So until they hear "The vase broke," they don't think that break can appear in that construction. The problem with this theory is that lots of verbs are so rare that it's possible that (a) the verb can be used in both constructions, but (b) you'll never hear it used in both.

Another possibility is that kids are wildly optimistic about verb alternations and assume any verb can appear in any form unless told otherwise. There are two problems with this. The first is that kids are rarely corrected when they say something wrong. But perhaps you could just assume that, after a certain amount of time, if you haven't heard e.g. The ball threw then threw can't be used without an object. The problem with that is, again, that some verbs are so rare that you'll only hear them a few times in your life. By the time you've heard that verb enough to know for sure it doesn't appear in a particular construction, you'll be dead.

The verb class hypothesis

In the late 1980s, building on previous work, Steven Pinker suggested a solution to this problem. Essentially, there are certain types of verbs which, in theory, could participate in a given alternation. Verbs involving caused changes (break, eat, laugh) in theory can participate in the causative alternation, and verbs involving transfer of possession (roll, donate) in theory can participate in the dative alternation, and this knowledge is probably innate. What a child has to learn is which verbs do participate in the dative alternation.

For reasons described above, this can't be done one verb at a time. And this is where the exciting part of the theory comes in. Pinker (building very heavily on work by Ray Jackendoff and others) argues that verbs have core aspects of their meaning and some extra stuff. For instance, break, crack, crash, rend, shatter, smash, splinter and tear all describe something being caused to fall to pieces. What varies between the verbs is the exact manner in which this happens. Jackendoff and others argues that the shared meaning is what is important to grammar, whereas the manner of falling to pieces was extra information which, while important, is not grammatically central.

Pinker's hypothesis was that verb alternations make use of this core meaning, not the "extra" meaning. From the perspective of the alternation, then, break, crack, crash, rend, shatter, smash, splinter and tear are all the same verb. So children are not learning whether break alternates, they learn whether the whole class of verbs alternate. Since there are many fewer classes than than there are verbs (my favorite compendium VerbNet has only about 270), the fact that some verbs are very rare isn't that important. If you know what class it belongs to, as long as the class itself is common enough, you're golden.

Testing the theory

This particular theory has not been tested as much as one might expect, partly because it is hard to test. It is rather trivial to show that verbs do or don't participate in alternations as a class, partly because that's how verb classes are often defined (that's how VerbNet does it). Moreover, various folks (like Stefanowitsch, 2008) argue that although speakers might notice the verb classes, that doesn't prove that people actually do use those verb classes to learn which verbs alternate and which do not.

The best test, then, is it teach people -- particularly young children -- new verbs that either belong to a class that does alternate or to a class that does not and see if they think those new verbs should or should not alternate. Very few such studies have been done.

Around the same time Pinker's seminal Language and Cognition came out in 1989, which outlines the theory I described above, a research team led by his student Jess Gropen (Gropen, Pinker, Hollander, Golberg and Wilson, 1989) published a study of the dative alternation. They taught children new verbs of transfer (such as "moop," which meant to move an object to someone using a scoop), which in theory could undergo the dative alternation. The question they asked was whether kids would be more likely to use those verbs in the alternation if the verbs were monosyllabic (moop) or bisyllabic (orgulate). They were more likely to do so for the monosyllabic verbs, and in fact in English monosyllabic verbs are more likely to alternate. This issue of how many syllables the verb has did come up in Language and Cognition, but it wasn't -- at least to me -- the most compelling part of the story (which is why I left it out of the discussion so far!).

Ambridge, Pine and Rowland (2011)

Ben Ambridge, Julian Pine and Caroline Rowland of the University of Liverpool have a new study in press which is the only study to have directly tested whether verb meaning really does guide which constructions a child thinks a given verb can be used in, at least to the best of my knowledge -- and apparently to theirs, since they don't cite anyone else. (I've since learned that Brooks and Tomasello, 1999, might be relevant, but the details are sufficiently complicated and the paper sufficiently long that I'm not yet sure.)

They taught children two novel verbs, one of which should belong to a verb class that participates in the causative alternation (a manner of motion verb: bounce, move, twist, rotate, float) and one of which should not (an emotional expression: smile, laugh, giggle). Just to prove to you that these classes exist, compare:

John bounced/moved/twisted/rotated/floated the ball.

The ball bounced/moved/twisted/rotated/floated.

*John smiled/laughed/giggled Sally.
Sally smiled/laughed/giggled.

Two groups of children (5-6 years old and 9-10 years old) were taught both types of verbs with subjects only. After a lot of training, they were shown new sentences with the verbs and asked to rate how good the sentences were. In the case of the manner of motion verb, they liked the sentences whether the verb had an subject and an object or if the verb had only a subject. That is, they thought the verb participated in the causative alternation. For the emotion expression verb, however, they thought it sounded good with a subject only; when it had both a subject and an object, they thought it did not sound good. This was true both for the older kids and the younger kids.

This is, I think, a pretty nice confirmation of Pinker's theory. Interestingly, Ambridge and colleagues think that Pinker is nonetheless wrong, but based on other considerations. Partly, our difference of opinion comes from the fact that we interpret Pinker's theory differently. I think I'm right, but that's a topic for another post. Also, there is some disagreement about a related phenomenon (entrenchment), but that, too, is a long post, and the present post is long enough.

____
Gropen, J., Pinker, S., Hollander, M., Goldberg, R., and Wilson, R. (1989). The Learnability and Acquisition of the Dative Alternation in English Language, 65 (2) DOI: 10.2307/415332

Ben Ambridge, Julian M. Pine, and Caroline F. Rowland (2011). Children use verb semantics to retreat from overgeneralization errors Cognitive Linguistics

For picture credits, look here and here.

Learning the passive

Posted by GamesWithWords on Monday, January 10, 2011

If Microsoft Word had its way, passive verbs would be excised from the language. That would solve children some problems, because passive verbs are more difficult to learn than one might think, because not all verbs passivize. Consider:

*The bicycle was resembled by John.
*Three bicycles are had by John.
*Many people are escaped by the argument.

The bicycle was resembled by John: A how-to guide.

So children must learn which verbs have passives and which don't. I recently sat down to read Pinker, Lebeaux and Frost (1987), a landmark study of how children learn to passivize verbs. This is not a work undertaken lightly. At 73 pages, Pinker et al. (1987) is not Steve Pinker's longest paper -- that honor goes to his 120-page take-down of Connectionist theories of language, Pinker and Prince (1988) -- but it is long, even for psycholinguistics. It's worth the read, both for the data and because it lays out the core of what become Learnability and Cognition, one of the books that has had the most influence on my own work and thinking.

The Data

The authors were primarily interested in testing the following claim: that children are conservative learners and only passivize verbs that they have previously heard in the passive. This would prevent them from over-generating passives that don't exist in the adult language.

First, the authors looked at a database of transcriptions of child speech. A large percentage of the passive verbs they found were passives the children couldn't possibly have heard before because they aren't legal passives in the adult language:

It's broked? (i.e., is it broken?)
When I get hurts, I put dose one of does bandage on.
He all tieded up, Mommy.

Of course, when we say that the child couldn't have heard such passives before, you can't really be sure what the child heard. It just seems unlikely. To more carefully control what the child had heard, the authors taught children of various ages (the youngest group was 4 years old) made-up verbs. For instance, they might demonstrate a stuffed frog jumping on top of a stuffed elephant and say, "Look, the frog gorped the elephant." Then they would show the elephant jumping on top of a mouse and ask the child, "What happened to the mouse?"

If you think "gorp" has a passive form, the natural thing to do would be to say "The mouse was gorped by the elephant." But a child who only uses passive verbs she has heard before would refuse to utter such a sentence. However, across a range of different made-up verbs and across four different experiments, the authors found that children were willing -- at least some of the time -- to produce these new passive verbs. (In addition to production tests, there were also comprehension tests where the children had to interpret a passivization of an already-learned verb.)

Some Considerations

These data conclusively proved that children are not completely conservative, at least not by 4 years of age (there has been a lot of debate more recently about younger children). With what we know now, we know that the conservative child theory had to be wrong -- again, at least for 4 yos -- but it's worth remembering that at the time, this was a serious hypothesis.

There is a lot of other data in the paper. Children are more likely to produce new passive forms as they get older (higher rates for 5 year-olds than 4 year-olds). They taught children verbs where the agent is the object and the patient is the subject (that is, where The frog gorped the elephant means "the elephant jumped on top of the frog"). Children had more difficulty passivizing those verbs. However, a lot of these additional analyses are difficult to interpret because of the small sample sizes (16 children and only a handful of verbs per experiment or sub-experiment).

Theory

Fair warning: the rest of this post is pretty technical.

What excites me about this paper is the theoretical work. For instance, the authors propose a theory of linking rules that have strong innate constraints and yet still some language-by-language variation.

The linkages between individual thematic roles in thematic cores and individual grammatical functions in predicate-argument structures is in turn mediated by a set of unmarked universal linking rules: agents are mapped onto subjects; patients are mapped onto objects; locations and paths are mapped onto oblique objects. Themes are mapped onto any unique grammatical function but can be expressed as oblique, object or subject; specifically, as the 'highest' function on that list that has not already been claimed by some other argument of the verb.

With respect to passivization, what is important is that only verbs which have agents as subjects are going to be easily passivized. The trick is that what counts as an 'agent' can vary from language to language.

It is common for languages to restrict passivized subjects to patients affect by an action ... The English verbal passive, of course, is far more permissive; most classes of transitive verbs, even those that do not involve physical actions, have the privilege of passivizability assigned to them. We suggest this latitude is possible because what counts as the patient of an action is not self-evident ... Languages have the option of defining classes in which thematic labels are assigned to arguments whose roles abstractly resemble those of physical thematic relations...

This last passage sets up the core of the theory to be developed in Learnability and Cognition. Children are born knowing that certain canonical verbs -- ones that very clearly have agents and patients, like break -- must passivize, and that a much larger group of verbs in theory might passivize, because they could be conceived of as metaphorically having agents and patients. What they have to learn is which verbs from that broader set actually do passivize. Importantly, verbs come in classes of verbs with similar meanings. If any verb from that set passivizes, they all will.

This last prediction is the one I am particularly interested in. A later paper (Gropen, Pinker, Hollander, Goldberg & Wilson, 1989) explored this hypothesis with regards to the dative alternation, but I don't know of much other work. In general, Learnability and Cognition go less attention than it should have, perhaps because by the time it was published, the Great Past Tense Debate had already begun. I've often thought of continuing this work, but teaching novel verbs to children in the course of an experiment is damn hard. Ben Ambridge has recently run a number of great studies on the acquisition of verb alternations (like the passive), so perhaps he will eventually tackle this hypothesis directly.

----
Pinker S, Lebeaux DS, and Frost LA (1987). Productivity and constraints in the acquisition of the passive. Cognition, 26 (3), 195-267 PMID: 3677572

Boston University Conference on Language Development: Day 1

Posted by GamesWithWords on Friday, November 05, 2010

BUCLD is one of my favorite conferences, not least of which because it takes place every year just across the river. This year has been shaping up to be a particularly good year, if the first day is any indication.

Ben Ambridge (w/Julien Pine & Caroline Rowland) gave an excellent talk on learning semantic restrictions on verb alternations. Of all the work Steve Pinker has done, I think his verb alternation work is the least well-known, but it's also probably my favorite work, and it's nice to see someone systematically revisiting these issues, and I think Ambridge is making some important contributions.

Kenny Smith (w/Elizabeth Wonnacott) presented a really neat proof-of-concept involving language evolution, showing that you can get robust regularization of linguistic systems in a community of speakers even if none of the individual learners/speakers have strong biases to regularize the input. This was a really fun talk; one of those talks that makes one reconsider one's life choices ("should I be studying language evolution?").

Dea Hunsicker (w/Susan Goldin-Meadow) presented new analyses of an old home-sign corpus, looking at evidence that this particular home sign had noun phrases. Home-sign, for those who don't know it, is an ad-hoc mini sign language often developed by deaf children who don't have exposure to a developed sign language.

If I had to pick a best talk, I'd pick Erin Conwell's talk (w/Tim O'Donnell & Jesse Snedeker) on the dative alternation, in which she sketched an explanation of why, although double-object constructions are overall more frequent that prepositional-object constructions, the latter seem to be more productive in early child language. But I may be biased here in that Erin is a post-doc in the same lab as me.

There were a number of other good talks today that I saw -- and many that I didn't -- which deserve mention. I'd write more, but it's late, and there's another full day coming up tomorrow.

Universal Grammar is dead. Long live Universal Grammar.

Posted by GamesWithWords on Wednesday, October 20, 2010

Last year, in a commentary on Evans and Levinson's "The myth of language universals: Language diversity and its importance for cognitive science" in Behavioral and Brain Sciences (a journal which published one target paper and dozens of commentaries in each issue), Michael Tomasello wrote:

I am told that a number of supporters of universal grammar will be writing commentaries on this article. Though I have not seen them, here is what is certain. You will not be seeing arguments of the following type: I have systematically looked at a well-chosen sample of the world's languages, and I have discerned the following universals ... And you will not even be seeing specific hypotheses about what we might find in universal grammar if we followed such a procedure.

Hmmm. There are no specific proposals about what might be in UG... Clearly Tomasello doesn't read this blog much. Granted, for that he should probably be forgiven. But he also clearly hasn't read Chomsky lately. Here's the abstract of the well-known Hauser, Chomsky & Fitch (2002):

We submit that a distinction should be made between the faculty of language in the broad sense (FLB) and in the narrow sense (FLN). FLB includes a sensory-motor system, a conceptual-intentional system, and the computational mechanisms for recursion, providing the capacity to generate an infinite range of expressions from a finite set of elements. We hypothesize that FLN only includes recursion and is the only uniquely human component of the faculty of language.

Later on, HCF make it clear that FLN is another way of thinking about what elsewhere is called "universal grammar" -- that is, constraints on learning that allow the learning of language.

Tomasello's claim about the other commentaries (that they won't make specific claims about what is in UG) is also quickly falsified, and by the usual suspects. For instance, Steve Pinker and Ray Jackendoff devote much of their commentary to describing grammatical principles that could be -- but aren't -- instantiated in any language.

Tomasello's thinking is perhaps made more clear by a later comment later in his commentary:

For sure, all fo the world's languages have things in common, and [Evans and Levinson] document a number of them. But these commonalities come not from any universal grammar, but rather from universal aspects of human cognition, social interaction, and information processing...

Thus, it seems he agrees that there are constraints on language learning that shape what languages exist. This, for instance, is the usual counter-argument to Pinker and Jackendoff's nonexistent languages: those languages don't exist because they're really stupid languages to have. I doubt Pinker or Jackendoff are particular fazed by those critiques, since they are interested in constraints on language learning, and this proposed Stupidity Constraint is still a constraint. Even Hauser, Chomsky and Fitch (2002) allow for constraints on language that are not specific to language (that's their FLB).

So perhaps Tomasello fundamentally agrees with people who argue for Universal Grammar, this is just a terminology war. They call fundamental cognitive constraints on language learning "Universal Grammar" and he uses the term to refer to something else: for instance, proposals about specific grammatical rules that we are born knowing. Then, his claim is that nobody has any proposals about such rules.

If that is what he is claiming, that is also quickly falsified (if it hasn't already been falsified by HCF's claims about recursion). Mark C. Baker, by the third paragraph of his commentary, is already quoting one of his well-known suggested language universals:

(1) The Verb-Object Constraint (VOC): A nominal that expresses the theme/patient of an event combines with the event-denoting verb before a nominal that expresses the agent/cause does.

And I could keep on picking examples. For those outside of the field, it's important to point out that there wasn't anything surprising in the Baker commentary or the Pinker and Jackendoff commentary. They were simply repeating well-known arguments they (and others) have made many times before. And these are not obscure arguments. Writing an article about Universal Grammar that fails to mention Chomsky, Pinker, Jackendoff or Baker would be like writing an article about major American cities without mentioning New York, Boston, San Francisco or Los Angeles.

Don't get me wrong. Tomasello has produced absurd numbers of high-quality studies and I am a big admirer of his work. But if he is going to make blanket statements about an entire literature, he might want to read one or two of the papers in that literature first.

-------
Tomasello, M. (2009). Universal grammar is dead Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X09990744

Evans, N., & Levinson, S. (2009). The myth of language universals: Language diversity and its importance for cognitive science Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X0999094X

Hauser MD, Chomsky N, & Fitch WT (2002). The faculty of language: what is it, who has it, and how did it evolve? Science (New York, N.Y.), 298 (5598), 1569-79 PMID: 12446899

Baker, M. (2009). Language universals: Abstract but not mythological Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X09990604

Pinker, S., & Jackendoff, R. (2009). The reality of a universal language faculty Behavioral and Brain Sciences, 32 (05) DOI: 10.1017/S0140525X09990720

Field of Science