Field of Science

Showing posts with label computational linguistics. Show all posts
Showing posts with label computational linguistics. Show all posts

VerbCorner: A Citizen Science project to find out what verbs mean

Earlier this week, I blogged about our new VerbCorner project. At the end, I promised that there would be more info forthcoming about why we are doing this project, about its aims and expected outcomes, why it's necessary, etc. Here's the first installment in that series.

Computers and language

I just dictated the following note to Siri
Many of our best computer systems treat words as essentially meaningless symbols that need to be moved around.
Here's what she wrote
Many of our best computer system street words is essentially meaningless symbols that need to be moved around.
I rest my case.

The problem of meaning.

I don't know for sure how Siri works, but her mistake is emblematic of how much language software works. Computer systems treat and Computer system street sound approximately the same, but that's not something most humans would notice because the first interpretation makes sense and the second one doesn't. 

Decades of research shows that human language comprehension is heavily guided by plausibility: when there are two possible interpretations of what you just heard, go for the one that makes sense. This happens speech recognition like in the example above, and it plays a key role in understanding ambiguous words. If you want to throw Google Translate for a look, give it the following:
John was already in his swimsuit as we reached the watering hole. "I hope the tire swing is still there," John said as he headed to the bank.
Although the most plausible interpretation of bank here is side of a river, Google Translate will translate it into the word for "financial institution" in whatever language you are translating into, because that's the most common meaning of the English work bank.

So what's the problem?

I assume that this limitation is not lost on the people at Google or at Apple. And, in fact, there are computer systems that try to incorporate meaning. The problem there is not so much the computer science as the linguistic science.** Dictionaries notwithstanding, scientists really do not know very much about what words mean, and it is hard to program the computer to know what the word means when you actually do not know.

(Dictionaries are useful, but as an exercise, pick* definition from a dictionary and come up with a counterexample. It is not hard.)

One of the limitations is scope. Language is huge. There are a lot of words. So scientists will work on the meanings of a small number of words. This is helpful, but a computer that only knows a few words is pretty limited. We want to know the meanings of all words.

Solving the problem

We've launched a new section of the website, VerbCorner. There, you can answer questions about what verbs mean. Rather than try to work out the meaning of a word all at once, we have broken up the problem in a series of different questions, each of which tries to pinpoint a specific component of meaning. Of course, there are many nuances to meaning, but research has shown that certain aspects are more important that others, and we will be focusing on those.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

----
*Dragon Dictate originally transcribed this as "pickled", which I did not catch on proofreading. More evidence that we need computer programs that understand what words mean.
**Dragon Dictate make spaghetti out of this sentence, too.

Citizen Science at GamesWithWords.org: The VerbCorner Project

What do verbs mean? We'd like to know. For that reason, we just launched VerbCorner, a massive, crowd-sourced investigation into the meanings of verbs. 

Why do we need this project? Why not just look up what verbs mean in a dictionary? While dictionaries are enormously useful (I think I own something like 15), they are far from perfect. For one thing, it's usually very easy to find counter-examples even for what seem like straight-forward definitions. Take the following:
Bachelor: An unmarried man.
So is the Pope a bachelor? Is Neil Patrick Harris? How about a married man from a country in which men are allowed multiple wives?

At VerbCorner, rather than trying to work out the whole definition at once, we have broken meaning into many different components. At the site, you will find several different tasks. In each task, you will try to determine whether a particular verb has a particular component of meaning. 

If you are interested in what words mean and would like to help with this project, sign up for an account at http://gameswithwords.org/VerbCorner/. Participation can be anonymous, but we are happy to recognize significant contributions from anyone who wishes it.

I will be writing a lot more about this project, it's goals, the science behind it, and the impact we expect it to have over the coming weeks. In the meantime, please check it out.

I say "uncle", you say "DaJiu"

Kinship terms (mother, uncle, niece, etc.) are socially important and generally learned early in acquisition. Interestingly, different languages have different sets of terms. Mandarin, for instance, divides "uncle" into "father's older brother", "father's younger brother", and "mother's brother".
Stranger things (to an anglophone, anyway) happen, too: In Northern Paiute, the kin terms for grandparents and grandchildren are self-reciprocal: you would use the same word to refer to your grandmother (if you are female) that she uses to refer to you. (See my previous post on "mommy" across languages.)






































Kinship terms in English and Northern Paiute. Ignore all the logical terms for now.
(Figure taken from Kemp & Regier, 2012)

Even so, there are a lot of similarities across languages. Disjunctions are relatively rare; that is, it's unusual to see a word that means "father or cousin". Usually there are more words to distinguish varieties of closely-related relatives (sister, brother) than distant relatives (cousin). How come? One obvious answer is that maybe the kinship systems we have are just better than the alternatives (ones with words like "facousin" = "father or cousin"), but it would be nice to show this.

Optimal Kinship Terms

In a paper earlier this year, Charles Kemp and Terry Regier did just that.
We show that major aspects of kin classification follow directly from two general principles: Categories tend to be simple, which minimizes cognitive load, and to be informative, which maximizes communicative efficiency ... The principles of simplicity and informativeness trade off against each other... A system with a single category that includes all possible relatives would be simple but uninformative because this category does not help to pick out specific relatives. A system with a different name for each relative would be complex but highly informative because it picks out individual relatives perfectly. 
That seems intuitively reasonable, but these are computational folk, so they formalized this with math. The details are in the paper, but roughly: They formalize the notion of complexity by using minimum description length in a representational language based on primitives like FEMALE and PARENT. The descriptions of the various terms in English and Northern Paiute are shown in parts C and D of the figure above. Communicativeness is formalized by measuring how ambiguous each term is (how many people it could potentially refer to).

A language is considered "better" than another if it out-scores the other on one dimension (e.g., simplicity) and no worse on the other (informativeness). A language is near-optimal if it there is hardly any possible language that is better. They looked at a number of different existing kinship systems (English, Northern Paiute, and a bunch of others) and found that all of them were near-optimal.

Evolution, Culture, or Development?

There are generally three ways of explaining any given behavior: evolution (we evolved to behave that way), culture (culture -- possibly through cultural evolution -- made us that way), or development (we learned to behave that way). For instance, it's rare to find people who chiefly eat arsenic. This could be because of evolution (we evolved to avoid arsenic because the arsenic-eaters don't have children and pass on their genes), cultural evolution (cultures that prized arsenic-eating all died out, leaving the non-arsenic cultures as the only game in town), or development (we learned as children, through trial and error, that eating arsenic is a bad idea). If I remember my Psych 101, food preferences actually involve all three.

What about kinship terms? If they are optimal, who do we credit with their optimality? Probably not development (we don't each individually create optimal kinship terms in childhood). Kemp and Regier seem to favor cultural evolution: over time, more useful kinship terms stuck in the lexicon of a given language and useless ones like "facousin" died out. It would be nice to show, however, that it is not actually genetic. This wouldn't have to be genes for kinship terms, but it could be genes that bias you to learn naming systems that are near-optimal (kinship naming systems or otherwise). One would need to show that these arose for language and not just cognition in general.

------
ResearchBlogging.org Kemp, C., and Regier, T. (2012). Kinship Categories Across Languages Reflect General Communicative Principles Science, 336 (6084), 1049-1054 DOI: 10.1126/science.1218811

Language fact of the day

The name that appears most often in Genesis is "Jacob", followed by "Joseph".

In other news, the most common word in Moby Dick is "the"; the most common noun (excluding pronouns) is, not surprisingly, "whale".

In Genesis, Moby Dick, and a number of other texts, three-letter words are more common than word of any other length (the one exception I've found so far is Moby Dick)

(Yes, I am learning to use NLTK, which so far I like a lot)
claimtoken-509944af17bd2

Is Dragon Dictate a believer?

I've been using Dictate to take notes on Talmy's Toward a Cognitive Semantics. One of the example sentences is as follows:
I aimed my gun into the living room. (p. 109)
I cannot by any means convince Dictate to print this. It prefers to convert "my gun" to "my God". For example, on my third try, it wrote:
I aim to my God into the living room.
Dictate offers a number of alternatives in case its initial transcription is incorrect. Right now, it is suggesting, as an alternative to "aim to my God":
aimed to my God
aim to my God and
aim to my god
aim to my gun
aimed to my God and
aim to my garden
aimed to my god
aimed to my gun
aim to my guide
aim to my God in
aimed to my God in 
Perhaps Nuance has a religious bent, but I suspect that this is a simple N-gram error. Like many natural language processing systems, Nuance figures out what word you are saying in part by reference to the surrounding words. So in general, it thinks that common bigrams (2-word sequences) are more likely than uncommon bigrams.

According to Google, "my god" appears on the Web 133,000,000 times, whereas "my gun" appears only 8,770,000 times. So "my god" is just much more likely. Similarly, "aim to" is fairly common (215,000,000) hits. So even though "aim to my God" is gibberish, the two components -- "aim to" and "my god" -- are fairly common, whereas the correct phrase -- "aimed my gun" -- is fairly rare (138,000 hits). (The bigram "aimed my" is also infrequent: 474,000 hits).

N-gram systems work better than most everything else, which is why Nuance, Google, and many other companies use them. But examples like this show their deep limitations, in that they make many obvious errors -- obvious to humans, anyway. In this case, because Nuance doesn't know what sentences mean, and doesn't even know basic grammar, it can't tell that "aimed to my god" is both grammatically incorrect and meaningless.

Someday we will hopefully have good dictation software. For now, there is Dragon Dictate

Mary Grover at Salon has distilled the essence of using Dragon Dictate into a brief post. I couldn't possibly do better -- or even as well -- so I refer you to it.

I was assured by several people that if I continued to use DragonDictate and use the vocabulary training feature, eventually the software would learn to do a better job. I made sure to diligently train Dragon on everything that I wrote. Unfortunately, it appears that training function itself is broken. I was suspicious that even when I used very unusual words, it always insisted that it already knew all those words. So I tested it by training on a set of made-up words. When Dragon happily announced that it already knew all of those words, too, I wrote an e-mail to technical support.
I wasn't very optimistic about hearing from technical support, since they had not replied to my previous e-mails when I have had other questions. This time, they replied promptly to tell me the technical support had expired, but that I could pay for extended technical support. Presumably, if I were to pay, they would go back to not answering e-mails.
I spend a lot of time at the computer, and I bought Dragon so that I wouldn't have to spend all of that time typing. I pace when I think, but pacing and typing don't mix. I thought Dragon would give me more flexibility. As of yet, this remains a distant dream.

More on DragonDictate

DragonDictate continues to do a decent job of writing my email, so long as I don't talk about work. For writing papers, etc., it continues to be of limited use.

I was just dictating notes on how children learn to count. In two back-to-back sentences, I mentioned "subset-knowers". The first time, this was transcribed as "subset-members", and the second time, it was "sunset-whores".


DragonDictate

 I have been doing a great deal of writing lately, though obviously not here. I thought that perhaps at some point in graduate school, I should try getting some of the projects I have done published, and I thought that time was now. Since this requires writing them up, I have been writing. I have gotten a lot of writing done, but I noticed that this came with an increased number of hours spent sitting at my computer. Knowing enough  friends who have suffered from repetitive stress injuries, I decided I should take a proactive approach to ergonomics.

One outcome of this process is was that I purchased voice-recognition software, namely Dragon Dictate. This actually complements my preference to pace while I think. My writing style involves a lot of thinking, punctuated by occasional bursts of typing. So being able to write as I pace seemed like a good idea.

I cannot say that this experiment has been an overwhelming success. Based on what I have learned from the documentation, Dragon Dictate seems to place a great deal of faith in transitional probabilities. That is,  the hypotheses it makes about what you are saying are based not only on the sounds that you make but based on what words typically come after one another.

Of course, what words typically follow one another depends a great deal on what you are talking about. I suspect that Dragon Dictate was not trained on a corpus involving a great deal of psycholinguistics papers, but in fact it is psycholinguistics papers that I am writing. Dragon Dictate makes a number of very systematic and very annoying errors. For instance, it is absolutely convinced that, no matter how carefully I say the word “verb”, I could not possibly have meant to say that word, and probably meant "four herbs" or some such. In the general case, this is probably the right conclusion. The word “verb” is so  rarely  spoken, that it is probably a good bet that it even if you think you  heard the word “verb”, what was actually spoken was probably something else. However, since almost all my papers are about verbs, I use that word so often that probably the right hypothesis is that no matter what you think you heard, the word I actually uttered was “verb”.

Needless to say, it doesn't do very well with technical terms from semantic and syntactic theory, either.

 The upshot is that I spend so much time correcting DragonDictate's mistakes, that it is not clear that I wouldn't be better off just typing the document begin with (you can correct using voice commands, but it is so cumbersome that I usually type instead). Dragon Dictate has a function where you can feed it various documents. The documentation appears to imply that it can learn the relevant word frequencies and transitional probabilities from these documents. I have been feeding at papers I have written, in the hopes that this will help out. So far there has been limited improvement, but I am not sure just how large a corpus of needs. I will keep you updated.

(Written using DragonDictate plus hand correction.)

Another problem with statistical translation

In the process of writing my latest article for Scientific American Mind, I spent a lot of time testing out automatic translators like Google Translate. As I discuss in the article, these programs have gotten a lot better in recent years, but on the whole they are still not very good.

I was curious what the Italian name of one of my favorite arias meant. So I typed O Soave Fanciulla into Google Translate. Programs like Google Translate are trained by comparing bilingual documents and noting, for a given word in one language, what word typically appears in the other language in the same place. Not surprisingly, Google Translate translated O Soave Fanciulla as O Soave Fanciulla -- no doubt because it was the case that, in the bilingual corpora GT was trained on, sentences with the phrase o soave fanciulla in Italian had o suave fanciulla in English.

I was reduced to translating the words one at a time: soave -> sweet, fanciulla -> girl. GT thinks o means or, but I expect that's the wrong reading in this context ("or sweet girl"?).

Google Translate Fail

Google Translate's blog:
There are some things we still can't translate. A baby babbling, for example. For the week of November 15th we are releasing five videos of things Google can’t translate (at least not yet)! Check out the videos and share them with your friends. If you can think of other things you wish Google translated (like your calculus homework or your pet hamster), tweet them with the tag #GoogleTranslate. We’ll be making a video of at least one of the suggestions and adding it to our page.
What do I wish Google Translate could translate? I'll bite. How about Russian? Or Japanese?

I mean, have the folks over at GT ever actually used their product? It's not very good. I'll admit that machine translation has improved a lot in recent years, but I doubt it's as good as a second-year Spanish student armed with a pocket dictionary.

Nothing against the fine engineers working at Google. GT is an achievement to be proud of, but when they go around claiming to have solved machine translation, it makes those of us still working on the problems of language look bad. It's hard enough to convince my parents that I'm doing something of value without Google claiming to have already solved all the problems.

Cognitive Science, March 2010

In my continuing series on the past year in Cognitive Science: March, 2010.

Once again, the discussion of some of these papers will be technical.

March


Baroni, Murphy, Barbu, Poesio. Strudel: A corpus-based semantic model based on properties and types.

You are who your friends are. A number of computational linguists have been interested in just how much you can learn about a word based on the other words it tends appear with. Interestingly, if you take a word (e.g., dog) and look at the words it tends to co-occur with (e.g., cat), those other words often describe properties or synonyms of the target word. A number of researchers have suggested that this might be part of how we learn the meanings of words.

Baroni et al. are sympathetic to that literature, but they point out that such models are only learned that dog and cat are somehow related. So they don't actually tell you what the word dog means. Moreover, Dog is also related to leash, but not in the same way it's related to cat, which is something those models ignore. Their paper covers a new model, Strudel, which attempts to close some of the gap.

The model also keeps track of what words co-occur with a target word. It additionally tracks how those words are related (e.g., dogs and cats is considered to be different from dogs chase cats). The more different types of constructions that connect the target word and a given "friend", the more important that friend is thought to be.

This model ends up doing a better job than some older models at finding semantic associates of target words. It also can cluster different words (e.g., apple, banana, dog, cat) into categories (fruit, animal) with some success. Moreover, with some additional statistical tricks, they were able to clump the various "friends" into different groups based on the type of constructions they appear in. Properties, for instance, often appear in constructions involving X has Y. Conceptually-similar words appear in other types of constructions (e.g., X is like Y).

This presents some clear advantages over previous attempts, but it has some of the same limitations as well. The model discovers different types of features of a target word (properties, conceptually-similar words, etc.), but the label "property" has to be assigned by the researchers. The model doesn't know that has four legs is a property of dog and that like to bark is not -- it only knows that the two facts are of different sorts.

Perruchet & Tillman. Exploiting multiple sources of information in learning an artificial language: human data and modeling. 

Over the last 15 years, a number of researchers have looked at statistically-based word segmentation. After listening to a few minutes of speech in an unknown language, people can guess which sequences of phonemes are more likely to be words in that language.

It turns out that some sequences of phonemes just sound more like words, independent of any learning. The authors check to see whether that matters. Participants were assigned to learn one of two languages: a language in which half of the words a priori sounded like words, and a language in which half the words a priori sounded particularly not like words. Not only did participants do better in the first condition on the words that sound like words, they did better on the "normal" words, too -- even though those were the same as the "normal" words in the second condition. The authors argue that this is consistent with the idea that already knowing some words helps you identify other words.

They also find that the fact that some words a priori sound more like they are words is easy to implement in their previously-proposed PARSER model, which then produces data somewhat like the human data from the experiment.

Gildea & Temperley. Do grammars minimize dependency length?

Words in a sentence are dependent on other words. In secondary school, we usually used the term "modify" rather than "depend on." So in The angry butcher yelled at the troublesome child, "the angry butcher" and "at the troublesome child" both modify/depend on yelled. Similarly, "the angry" modifies/depends on butcher. Etc.

This paper explores the hypothesis that people try to keep words close to the words they depend on. They worked through the Wall Street Journal corpus and calculated both what the actual dependency lengths were in each sentence (for each word in the sentence, count all the words that are between a given word and the word it depends on, and sum) and also what the shortest possible dependency length would be. They found that actual dependency lengths were actually much  closer to the optimum in both the WSJ corpus and the Brown corpus than would be expected by chance. However, when they looked at two corpora in German, while dependency lengths were shorter than would be expected by random, the effect was noticeably smaller. The authors speculate this is because German has relatively free word order, because German has some verb-final constructions, or some other reason or any combination of those reasons.

Mueller, Bahlmann & Friederici. Learnability of embedded syntactic structures depends on prosodic cues. 

Center-embedded structures are hard to process and also difficult to teach people in artificial grammar learning studies that don't provide feedback. The authors exposed participants to A1A2B1B2 structures with or without prosodic cues. Participants largely failed to learn the grammar without prosodic cues. However, if a falling contour divided each 4-syllable phrase (A1A2B1B2) from each other, participants learned much more. They did even better if a pause was added in addition to the falling contour between 4-syllable phrases. Adding an additional pause between the As and Bs (in order to accentuate the difference between As and Bs) did not provide any additional benefit.

You are what you say

I recently received an email forward about AnalyzeWords.com. According to its promoters

AnalyzeWords help reveal your personality by looking at how you use words. It is based on good scientific research connecting word use to who people are.
The way the site works is that you enter in someone's Twitter handle and the site analyzes their tweets.

The forward included the following comment from someone from whom, indirectly, I got the email, saying "So far it says everyone I've looked at (people, journals, etc) is depressed, except for an account someone set up to chronicle his battle with cancer, which it classified as 'very upbeat'." I tried a handle or two myself and got similar results.

One possible conclusion is that everyone -- or, at least, everyone who uses Twitter -- is depressed. Or the theory behind the website doesn't actually work. I found a possible hint in favor of the latter hypothesis on AnalyzeWords' "The Science Behind AnalyzeWords" page:

Across dozens of studies, junk words [closed-class words like prepositions and pronouns] have proven to be powerful markers of peoples [sic] psychological states. When individuals use the word I, for example, they are briefly paying attention to themselves. People experiencing high levels of physical or mental pain automatically orient towards themselves and begin using I-words at higher rates. I-use, then, can reflect signs of depression, stress or insecurity.
Perhaps. Or perhaps they're using Twitter to talk about themselves and their latest experiences.

Is language just statistics?

Many years ago, I attended a talk in which a researcher (in restrospect, probably a graduate student) was talking about some work she was doing on modeling learning. She mentioned that a colleague was very proud of a model he had put together in which he had a model world populated by model creatures which learned to avoid predators and find food.

She reported that he said, "Look, they are able to learn this without *any* input from the programmer. It's all nurture, not nature." She argued with him at length to point out that he had programmed into his model creatures the structures that allowed to them to learn. Change any of those parameters, and they ceased to learn.

There are a number of researchers in the field of language who, impressed by the success of statistical-learning models, argue that much or all of language learning can be accomplished by simply noticing statistical patterns in language. For instance, there is a class of words in English that tend to follow the word "the." A traditional grammarian might call these "nouns," but this becomes unnecessary when using statistics.

There are many variants of this approach, some more successful than others. Some are more careful in their claims (one paper, I recall, stated strongly that the described model did away with not only grammatical rules, but words themselves).

While I am impressed by much of the work that has come out of this approach, I don't think it can ever do away with complex (possibly innate) structure. The anecdote above is an argument by analogy. Here is a great extended quote from Language Learnability and Language Development, Steven Pinker's original, 1984 foray into book writing:

As I argued in Pinker (1979), in most distributional learning procedures there are vast numbers of properties that a learner could record, and since the child is looking for correlations among these properties, he or she faces a combinatorial explosion of possibilities. For example, he or she could record of a given word that it occurs int eh first (or second, or third, or nth) position in a sentence, that it is to the left (or right) of word X or word Y or ..., or that it is to the left of the word sequence WXYZ, or that it occurs n the same sentence with word X (or words X, Y, Z, or some subset of them), and so on. Adding semantic and inflectional information to the space of possibilities only makes the explosion more explosive. To be sure, the inappropriate properties will correlate with no others and hence will eventually be ignored, leaving only the appropriate grammatical properties, but only after astronomical amounts of memory space, computation, or both.

In any case, most of these properties should be eliminated by an astute learner as being inappropriate to learning a human language in the first place. For example, there is no linguistic phenomenon in any language that is contingent upon a word's occupying the third serial position in a sentence, so why bother testing for one? Testing for correlations among irrelevant properties is not only wasteful but potentially dangerous, since many spurious correlations will arise in local samples of the input. For example, the child could hear the sentences John eats meat, John eats slowly, and the meat is good and then conclude that the slowly is good is a possible English sentence.

Ultimately, a pure-statistics model still has to decide what regularities to keep track of and what to ignore, and that requires at least some innate structure. It probably also requires fairly complex grammatical structures, whether learned or innate.

Can computers talk? (The Chinese Room)

Can computers talk? Right now, no. Natural Language Processing -- the field of Artificial Intelligence & Linguistics that deals with computer language (computers using language, not C++ or BASIC) -- has made strides in the last decade, but the best programs still frankly suck.

Will computers ever be able to talk? And I don't mean Alex the Parrot talk. I mean speak, listen and understand just as well as humans. Ideally, we'd like something like a formal proof one way or another, such as the proof that it is impossible to write a computer program that will definitively determine whether another computer program has a bug in it (specifically, a type of bug known as an infinite loop). That sort of program has been proven to be impossible. How about a program to emulate human language?

One of the most famous thought experiments to deal with this question is the The Chinese Room, created by John Searle back in 1980. The thought experiment is meant to be a refutation to the idea that a computer program, even in theory, could be intelligent. It goes like this:

Suppose you have a computer in a room. The computer is fed a question in Chinese, and it matches the question against a database in order to find a response. The computer program is very good, and its responses are indistinguishable from that of a human Chinese speaker. Can you say that this computer understands Chinese?

Searle says, "No." To make it even more clear, suppose the computer was replaced by you and a look-up table. Occasionally, sentences in Chinese come in through a slot in the wall. You can't read Chinese, but you were given a rule book for manipulating the Chinese symbols into an output that you push out the "out" slot in the wall. You are so good at using these rules that your responses are as good as those of a native Chinese speaker. Is it reasonable to say that you know Chinese?

The answer is, of course, that you don't know Chinese. Searle believes that this demonstrates that computers cannot understand language and, scaling the argument up, cannot be conscious, have beliefs or do anything else interesting and mentalistic.

One common rebuttal to this argument is that the system which is the room (input, human, look-up table) knows Chinese, even though the parts do not. This is attractive, since in some sense that is true of our brains -- the only systems we know do in fact understand language. The individual parts (neurons, neuron clusters, etc.) do not understand language, but the brain as a whole does.

It's an attractive rebuttal, but I think there is a bigger problem with Searle's argument. The thought experiment rests on the presupposition that the Chinese Room would produce good Chinese. Is that plausible?

If the human in the room only had a dictionary, it's clearly not reasonable. Trying to translate based on dictionaries produces terrible language. Of course, Searle's Chinese Room does not use a dictionary. The computer version of it uses a database. If this is a simple database with two columns, one for input and one or output, it would have to be infinitely large to perform as well as a human Chinese speaker. As Chomsky famously demonstrated long ago, the number of sentences in any language is infinite. (The computer program could be more complicated, it is true. At an AI conference I attended several years ago, template-based language systems were all the rage. These systems try to fit all input into one of many template sentences. Responses, similarly, are created out of templates. These systems work much better than earlier computerized efforts, but they are still very restricted.)

The human version of the Chinese Room Searle gives us is a little bit different. In that one, the human user has a set of rules to apply to the input to achieve an output. In Minds, Brains and Science, which contains the version of this argument that I'm working from, he isn't very explicit as to how this would work, but I'm assuming it is something like a grammar for Chinese. Even supposing using grammar rules without knowledge of the meaning of the words would work, the fact is that after decades of research, linguists still haven't worked out a complete grammatical description of any living language.

The Chinese Room would require a much, much more sophisticated system than what Searle grants. In fact, it requires something so complicated that nobody even knows what it would look like. The only existing algorithm that can handle human language is implemented in the human brain. The only machine currently capable of processing human language as well as a human is the human brain. Searle's conceit was that we could have "dumb" algorithm -- essentially a look-up table -- that processed language. We don't have one. Maybe we never will. Maybe in order to process human language at the same level of sophistication as a human, the "system" must be intelligent, must actually understand what it's talking about.

This brings us to the flip argument to Searle's thought expeirment: Turing's. Turing proposed to test the intelligence of computers this way: once a computer can compete effectively in parlor games, it's reasonable to assume it's as intelligent as a human. The parlor game in question isn't important: what's important is the flexibility it required. Modern versions of the Turing Test focus on the computer being able to carry on a normal human conversation -- essentially, to do what the Chinese Room would be required to do. The Turing assumption is that the simplest possible method of producing human-like language requires cognitive machinery on par with a human.

If anybody wants to watch a dramatization of these arguments, I suggest the current re-imagining of Battlestar Galactica. The story follows a war between humans and intelligent robots. The robots clearly demonstrate emotions, intelligence, pain and suffering, but the humans are largely unwilling to believe any of it is real. "You have software, not feelings," is the usual refrain. Some of the humans begin to realize that the robots are just as "real" to them as the other humans. The truth is that our only evidence that other humans really have feelings, emotions, consciousness, etc., is through their behavior.

Since we don't yet have a mathematical proof one way or another, I'll have to leave it at that. In the meantime, having spent a lot of time struggling with languages myself, the Turing view seems much more plausible than Searle's.