Field of Science

Negative Evidence: Still Missing after all these Years

My pen-pal Melodye has posted a thought-provoking piece at Child's Play on negative evidence. As she rightly points out, issues of negative evidence have played a crucial role in the development of theories of language acquisition. But she doesn't think that's a good thing. Rather, it's "ridiculous, [sic] and belies a complete lack of understanding of basic human learning mechanisms."

The argument over negative evidence, as presented by Melodye, is ridiculous, but that seems to stem from (a) conflating two different types of negative evidence, and (b) misunderstanding what the argument was about.
Fig. 1. Melodye notes that rats can learn from negative evidence, so why can't humans? We'll see why.

Here's Melodye's characterization of the negative evidence argument:
[T]he argument is that because children early on make grammatical ‘mistakes’ in their speech (e.g., saying ‘mouses’ instead of ‘mice’ or ‘go-ed’ instead of ‘went’), and because they do not receive much in the way of corrective feedback from their parents (apparently no parent ever says “No, Johnny, for the last time it’s MICE”), it must therefore beimpossible to explain how children ever learn to correct these errors. How — ask the psychologists — could little Johnny ever possibly ‘unlearn’ these mistakes? This supposed puzzle is taken by many in developmental psychology to be one of a suite of arguments that have effectively disproved the idea that language can be learned without an innate grammar.
What's the alternative? Children are predicting what word is going to come up in a sentence.
[I]f the child is expecting ‘mouses’ or ‘gooses,’ her expectations will be violated every time she hears ‘mice’ and ‘geese’ instead.  And clearly that will happen a lot.  Over time, this will so weaken her expectation of ‘mouses’ and ‘gooses,’ that she will stop producing these kinds of words in context.
I can't speak for every Nativist, or for everyone who has studied over-regularization, but since Melodye cites Pinker extensively and specifically, and since I've worked on over-regularization within the Pinkerian tradition, I think I can reasonably speak for at least a variant of Pinkerianism. And I think Melodye actually agrees with us almost 100%.


My understanding of Pinker's Words and Rules account -- and recall that I published work on this theory with one of Pinker's students, so I think my understanding is well-founded -- is that children originally over-regularize the plural of mouse as mouses, but eventually learn that mice is the plural of mouse by hearing mice a lot. That is, our account is almost identical to Melodye's except it doesn't include predictive processing. I actually agree that if children are predicting mouses and hear mice, that should make it easier to correct their mistaken over-regularization. But the essential story is the same.



Where I've usually seen Nativists bring up this particular negative evidence argument (and remember there's another) is in the context of Behaviorism, on which rats (and humans) learned through being explicitly rewarded for doing the right thing and explicitly punished for doing the wrong thing. The fact that children learning language are almost never corrected (as Melodye notes) is evidence against that very particular type of Empiricist theory.


That is, we don't (and to my knowledge, never have) argued that children can only learn the word mice through Universal Grammar. Again, it's possible (likely?) that someone has made that argument. But not us.[1]


Negative Evidence #2


There is a deeper problem with negative evidence that does implicate, if not Universal Grammar, at least generative grammars. That is, as Pinker notes in the article cited by Melodye, children generalize some things and not others. Compare:


(1) John sent the package to Mary.
(2) John sent Mary the package.
(3) John sent the package to the border.
(4) *John sent the border the package.


That * means that (4) is ungrammatical, or at least most people find it ungrammatical. Now, on a word-prediction theory that tracks only surface statistics (the forms of words, not their meaning or syntactic structure), you'd probably have to argue that whenever children have heard discussions of packages being sent to Mary, they've heard either (1) or (2), but in discussions of sending packages to borders, they've only ever heard (3) and never (4). This is surprising, and thus they've learned that (4) is no good. 


The simplest version of this theory won't work, though. Since children (and you) have presumably never heard any of the sentences below (where Gazeindenfrump and Bleizendorf are people's names, the dax is an object, and a dacha is a kind of house used in Russia):


(5) Gazeidenfrump sent the dax to Bleizendorf.
(6) Gazeidenfrump sent Bleizendorf the dax.
(7) Gazeidenfrump sent the dax to the dacha.
(8) *Gazeidenfrump sent the dacha the dax. 


Since we've heard (and expected) sentence #8 just as many times as we heard/expected (5-7), failures of predictions can't explain why we know (8) is bad but (5-7) isn't. (BTW If you don't like my examples, there are many, many more in the literature; these are the best I can think of off the top of my head.)


So we can't be tracking just the words themselves, but something more abstract. Pinker has an extended discussion of this problem in his 1989 book, in which he argues that the constraint is semantic: we know that you can use the double-object construction (e.g., 2, 4, 6 or 8) only if the recipient of the object can actually possess the object (that is, the dax becomes Bleizendorf's, but it doesn't become the dacha's, since dachas -- and borders -- can't own things). I'm working off of memory now, but I think -- but won't swear -- that Pinker's solution also involves some aspects of the syntactic/semantic structures above being innate.


Pinker's account is not perfect and may end up being wrong in some places, but it remains the fact that negative evidence (implicit or not) can't alone explain where children (and adults) do or do not generalize.


-----
Notes: 





[1] Melodye quotes Pinker saying "The implications of the lack of negative evidence for children's overgeneralization are central to any discussion of learning, nativist or empiricist." That is the quote that she says is "quite frankly, ridiculous." Here is the full quote. I'll let you decide whether it's ridiculous:
This nature–nurture dichotomy is also behind MacWhinney’s mistaken claim that the absence of negative evidence in language acquisition can be tied to Chomsky, nativism, or poverty-of-the-stimulus arguments. Chomsky (1965, p. 32) assumed that the child’s input ‘consist[s] of signals classified as sentences and nonsentences _’ – in other words, negative evidence. He also invokes indirect negative evidence (Chomsky, 1981). And he has never appealed to Gold’s theorems to support his claims about the innateness of language. In fact it was a staunch ANTI-nativist, Martin Braine (1971), who first noticed the lack of negative evidence in language acquisition, and another empiricist, Melissa Bowerman (1983, 1988), who repeatedly emphasized it. The implications of the lack of negative evidence for children’s overgeneralization are central to any discussion of learning, nativist or empiricist.



-----
Quotes:
PINKER, S. (2004). Clarifying the logical problem of language acquisition Journal of Child Language, 31 (4), 949-953 DOI: 10.1017/S0305000904006439


photo: Big Fat Rat

8 comments:

Melodye said...

To address the first part of your post, there are very serious differences between our account and Pinker's (and those in his theoretical camp).

Although it is agreed that children learn that the regular form of English plurals involves
adding a final sibilant (and nowhere do we dispute this), many linguists argue that morphology depends on innate rules. The claim is that
while the particulars (content) of the rules for specific languages are “learned” (including,
explicitly, the English –s), the operations of the rules themselves are constrained and
structured by innate mechanisms (see e.g., Clahsen, 1999; Pinker, 1998; Marcus,
Brinkmann, Clahsen, Wiese, & Pinker, 1995; Pinker & Prince, 1988, etc.):

“The organization of morphology has implications for the acquisition of morphology. Understanding language acquisition requires specifying the innate mechanisms that accomplish language learning, and the language-particular information that these mechanisms learn. It has been fruitful to posit that the universal basic organization of grammar is inherent in the learning mechanisms, which are deployed to acquire the particular words and rules in a given language” (Kim, Marcus, Pinker, Hollander & Coppola, 1994, p. 174-5).

“…focusing on a single rule of grammar [regular inflection], we find evidence for a system that is independent of real-world meaning, non-associative (unaffected by frequency and similarity), sensitive to abstract formal abstractions… more sophisticated than the kinds of "rules" that are explicitly taught, developing on a schedule not timed by environmental input, organized by principles that could not have been learned, possibly with a distinct neural substrate and genetic basis" (Pinker, 1991, p. 533).

“A model of overregularization that my colleagues and I have proposed depends on the existence of mental rules…The key property of a rule appears to be its ability to treat all instances of a class equally, regardless of their degree of resemblance to stored forms. Rules…apply in an all-or-none fashion, applying to any item carrying the appropriate symbol. For example, the add –ed rule applies just as readily to any novel word carrying the symbol [verb]” (Marcus, 1995).

"I do not exclude the possibility that high-frequency regular [plural]s are redundantly stored in the lexicon. And of course, a mechanism is necessary that blocks composition of the regular form in the presence of a listed irregular alternative. (Jackendoff, 2007, p10)

GamesWithWords said...

@Melodye: I think I misunderstood your use of the term "innate rule." I thought you meant something like "a rule which is innately specified," but I think you mean more something like "rule which is part of a grammar that subject to innate constraints."

That's exactly what the first quote is about (it doesn't mention innate rules). The third and fourth quotes don't mention innateness at all. The second quote comes the closest to my interpretation, in that it posits that morphology gives evidence of a language system "organized by principles that could not have been learned." I'm not sure that includes an innate rule, but I admit I haven't read that book lately. Perhaps you can find a quote that references the innate rule itself?

All that aside, expectation/surprisal still can't explain away the *other* problem of negative evidence, as discussed in the post. If you think you it can, please explain.

Avery Andrews said...

I rather doubt that the datives (2nd problem) are going to be a fatal problem for Ramscar et al. Being aware of a situation where sending is likely to be talked about, and hearing 'NP sent/didn't send' we might predict 'NP_theme to NP_recip' or 'NP_recip NP_goal', depending on a lot of things, and be right a reasonable amount of the time. However, if we are expecting a destination rather than a recipient, if we happen to predict NP_dest NP_theme, we will never be right (if it's a real locative destination, not a place name used to designate part of an organization). It doesn't seem implausible to me that a fairly general learning mechanism couldn't pick up on the potential for active participation of the recipient as the critical feature of the cases where the NP_rec NP_ theme predictions were always wrong.

Something I don't understand at all at this point is how the Ramscar lab would explain the workings of basic clause and NP structure, but I expect the discussion will get around to that in due course. And then when that is under control, it's time to have a look at concord in Kayardild!

GamesWithWords said...

@Avery: I think you're absolutely right in that prediction/surprisal could help children learn the right patterns. But it can only help if children are able to make the right kinds of predictions (e.g., about recipients, givers, etc.), which requires tracking the right level of structure.

Generative grammars are built on that kind of abstract structure, and sub-symbolic systems explicitly deny the existence of such structure. So it's not at all clear how the latter can pull off the job, and in fact folks like Hummel are arguing that they are mathematically incapable of pulling it off.

So in a sense, the issue of prediction/surprisal is something of a red herring here. The real debate is about the nature of representation.

Dexter Edge said...

As an outsider and an amateur, I'm going to stay out of the main argument here until I have read more of the literature and have thought more deeply about the issues. Since I'm not in a position to need to have allegiances, I tend to see positive aspects and problems on both sides of the Great Divide.

However, it seems to me that neither of the examples of allegedly ungrammatical sentences that you give under point 2 is persuasive.

A child is going to hear "*John sent the border the package" as potentially grammatical (and we are surely talking here mainly about language as spoken and heard, not as read), because "border" can be heard as "boarder," and a "boarder" is a someone to whom a package can be sent.

(One could argue that "boarder" is now a relatively rare word, but it's perhaps not all that much rarer in a child's experience than "border." And it's certainly a word that one might hear in an old movie, and this supposedly ungrammatical construction would sound just fine in that movie.)

"*Gazeidenfrump sent the dacha the dax" seems to me to cheat a little bit. Whereas "Gazeidenfrump" and "dax" are designed to be words that a child could not have heard before (assuming the child has never watched an episode of "Deep Space Nine"), "dacha" is an actual word that is chosen because it is a location.

But a child who hasn't heard the word has no reason to assume that the word represents a location, and the sentence could well be construed as grammatical if "the dacha" turned out to mean something like "the Grand Poobah" or "the Pope," who is (we can assume) someone to whom a dax could be sent.

It's probably possible to come up with examples that make your point, but I don't think these are those.

GamesWithWords said...

@Dexter: You raise good questions. The dative tests assume that the participants know what the words in the sentence mean. So, as you point out, you'd have to know that it's a border being talked about, not a boarder. (In fact, theorists like specifically that sentence because of the homophony -- it shows that it's the meaning that matters, not what the word sound like.)

As far as the made-up words: the theoretical argument is over whether people have consistent intuitions about sentences they've never heard before. Now, given the statistics of language, we can make some guesses about which sentences are unlikely to have been heard by a particular person (Chomsky's colorless green ideas sleep furiously is a good example), but we obviously can't know for sure without tape-recording the person's entire life.

One way to get around that problem experimentally is to use novel or made-up words. You teach the participant what the word means, then present them with the sentence. So we wouldn't be testing someone on sentences where made-up words come out of the blue.

Interestingly, not all verbs allow the double-object construction, and it seems that the verbs which do or don't can be distinguished semantically. So there were some nice experiments in the 80s (by Pinker and colleagues, I believe) in which participants were taught new verbs that either fit the semantics of the double-object construction or did not, and people "correctly" followed the existing pattern in the language with these new verbs (either using the double-object construction or not, as semantics dictated), despite having never heard those verbs used before.

This might sound like studying angels dancing on heads of pins, but in fact people do often use (or hear) verbs in new constructions. The novel word experiments are simply a nice way of mimicking that real-world behavior in the lab.

Anonymous said...

@games from Avery Andrews: I don't yet know what Michael Et Al's story about the phenomena of basic clause structure are going to be, so I'm not making any assumptions about it. How complex the innate endowment that supports language learning and whether any of it is specific to language is a different question from the existence of complex grammatical structure, although people seem to see them as more closely linked than they really are.

GamesWithWords said...

@Avery: Agreed -- If you're going to have a very simple structure, there's not much reason to posit its innateness, and if you are going to have a fabulously complex structure at some point it becomes un-learnable, but in between there's a lot of room for learned structure. Same goes with the language-specific/domain-general debate.. And it's unfortunate that there's less recognition of this middle ground than there could be.

This is the place, though, that I think the formal proofs do some work (particularly thinking about issues of innateness here): a given learning mechanism is capable of learning some structures and not others. I really do think that's theory-constraining data. The converse is to suggest that a given theory of learning makes *no* predictions about what is learnable, in which case I'm not sure it counts as a theory!

And that's why I think it's relevant and good for learning theorists to spend time discussing what their particular learning models can't learn. A limitation on learning isn't a limitation of the theory -- it's a strength! (To the extent that we care about theories being falsifiable, anyway.)