Field of Science

Another problem with statistical translation

In the process of writing my latest article for Scientific American Mind, I spent a lot of time testing out automatic translators like Google Translate. As I discuss in the article, these programs have gotten a lot better in recent years, but on the whole they are still not very good.

I was curious what the Italian name of one of my favorite arias meant. So I typed O Soave Fanciulla into Google Translate. Programs like Google Translate are trained by comparing bilingual documents and noting, for a given word in one language, what word typically appears in the other language in the same place. Not surprisingly, Google Translate translated O Soave Fanciulla as O Soave Fanciulla -- no doubt because it was the case that, in the bilingual corpora GT was trained on, sentences with the phrase o soave fanciulla in Italian had o suave fanciulla in English.

I was reduced to translating the words one at a time: soave -> sweet, fanciulla -> girl. GT thinks o means or, but I expect that's the wrong reading in this context ("or sweet girl"?).