I've been using Dictate to take notes on Talmy's Toward a Cognitive Semantics. One of the example sentences is as follows:
According to Google, "my god" appears on the Web 133,000,000 times, whereas "my gun" appears only 8,770,000 times. So "my god" is just much more likely. Similarly, "aim to" is fairly common (215,000,000) hits. So even though "aim to my God" is gibberish, the two components -- "aim to" and "my god" -- are fairly common, whereas the correct phrase -- "aimed my gun" -- is fairly rare (138,000 hits). (The bigram "aimed my" is also infrequent: 474,000 hits).
N-gram systems work better than most everything else, which is why Nuance, Google, and many other companies use them. But examples like this show their deep limitations, in that they make many obvious errors -- obvious to humans, anyway. In this case, because Nuance doesn't know what sentences mean, and doesn't even know basic grammar, it can't tell that "aimed to my god" is both grammatically incorrect and meaningless.
I aimed my gun into the living room. (p. 109)I cannot by any means convince Dictate to print this. It prefers to convert "my gun" to "my God". For example, on my third try, it wrote:
I aim to my God into the living room.Dictate offers a number of alternatives in case its initial transcription is incorrect. Right now, it is suggesting, as an alternative to "aim to my God":
aimed to my GodPerhaps Nuance has a religious bent, but I suspect that this is a simple N-gram error. Like many natural language processing systems, Nuance figures out what word you are saying in part by reference to the surrounding words. So in general, it thinks that common bigrams (2-word sequences) are more likely than uncommon bigrams.
aim to my God and
aim to my god
aim to my gun
aimed to my God and
aim to my garden
aimed to my god
aimed to my gun
aim to my guide
aim to my God in
aimed to my God in
According to Google, "my god" appears on the Web 133,000,000 times, whereas "my gun" appears only 8,770,000 times. So "my god" is just much more likely. Similarly, "aim to" is fairly common (215,000,000) hits. So even though "aim to my God" is gibberish, the two components -- "aim to" and "my god" -- are fairly common, whereas the correct phrase -- "aimed my gun" -- is fairly rare (138,000 hits). (The bigram "aimed my" is also infrequent: 474,000 hits).
N-gram systems work better than most everything else, which is why Nuance, Google, and many other companies use them. But examples like this show their deep limitations, in that they make many obvious errors -- obvious to humans, anyway. In this case, because Nuance doesn't know what sentences mean, and doesn't even know basic grammar, it can't tell that "aimed to my god" is both grammatically incorrect and meaningless.
No comments:
Post a Comment