In case you wanted to know more about our VerbCorner project.
Many thanks to the two undergraduates who helped make this video.
- Home
- Angry by Choice
- Catalogue of Organisms
- Chinleana
- Doc Madhattan
- Games with Words
- Genomics, Medicine, and Pseudoscience
- History of Geology
- Moss Plants and More
- Pleiotropy
- Plektix
- RRResearch
- Skeptic Wonder
- The Culture of Chemistry
- The Curious Wavefunction
- The Phytophactor
- The View from a Microbiologist
- Variety of Life
Field of Science
-
-
Change of address8 months ago in Variety of Life
-
Change of address8 months ago in Catalogue of Organisms
-
-
Earth Day: Pogo and our responsibility11 months ago in Doc Madhattan
-
What I Read 20241 year ago in Angry by Choice
-
I've moved to Substack. Come join me there.1 year ago in Genomics, Medicine, and Pseudoscience
-
-
-
-
Histological Evidence of Trauma in Dicynodont Tusks7 years ago in Chinleana
-
Posted: July 21, 2018 at 03:03PM7 years ago in Field Notes
-
Why doesn't all the GTA get taken up?7 years ago in RRResearch
-
-
Harnessing innate immunity to cure HIV9 years ago in Rule of 6ix
-
-
-
-
-
-
post doc job opportunity on ribosome biochemistry!11 years ago in Protein Evolution and Other Musings
-
Blogging Microbes- Communicating Microbiology to Netizens11 years ago in Memoirs of a Defective Brain
-
Re-Blog: June Was 6th Warmest Globally11 years ago in The View from a Microbiologist
-
-
-
The Lure of the Obscure? Guest Post by Frank Stahl13 years ago in Sex, Genes & Evolution
-
-
Lab Rat Moving House14 years ago in Life of a Lab Rat
-
Goodbye FoS, thanks for all the laughs14 years ago in Disease Prone
-
-
Slideshow of NASA's Stardust-NExT Mission Comet Tempel 1 Flyby15 years ago in The Large Picture Blog
-
in The Biology Files
Showing posts with label On the Internet. Show all posts
Showing posts with label On the Internet. Show all posts
Forums find GamesWithWords
A number of forums have picked up the WhichEnglish quiz, and have produced some really intelligent and insightful conversation. I recommend in particular this conversation on metafilter. There is also an extensive conversation at hacker news and a somewhat older discussion at reddit. And there is a lot of discussion in Finnish and Hungarian, but I have no idea what they are saying...
Small World of Words
A group of researchers in Belgium is putting together a very large word association network by asking volunteers to say which words are related to which other words. They are hoping to recruit around 300,000 participants, which makes it my kind of study! (Technically, I've never tried 300,000 participants -- I think we've never gone beyond about 50,000, though we have some new things in the pipeline...)
It looks interesting. To participate, go to www.smallworldofwords.com. You can read more about the project here.
It looks interesting. To participate, go to www.smallworldofwords.com. You can read more about the project here.
What you missed on the Web last week - 10/1/2012 edition
Forgetting kanji
Japanese computer users say they are forgetting how to handwrite kanji due to computer use. The number has increased from 10 years ago. An important question not addressed is whether they can write more kanji using a computer now than people could write by hand 10 years ago. (Hat tip: LanguageLog)
Another descriptivist/presciptivist debate
What I learned from it: I'm not the only one whose idiolect does not distinguish between relative clauses beginning with 'that' and 'which'.
Science spam
Neuroskeptic notices a dramatic rise in science spam (irrelevant conference announcements, lab products, etc.). Am I glad I'm not the only one, or sad for the world?
Animal research @ Freakonomics
A Freakonomics blogger writes that while he's generally against animal research, he supposes it might be OK if it led to saving human life. In response, Isis writes "This is someone who is pleading for us to understand animals, but is unwilling to understand where the basic health care that has enabled us (of the first world, primarily) to live as long as we do [comes from]". Or understand much of anything else. The author thinks one of the problems is that we don't understand animals well enough yet; presumably we'll understand them better if we stop studying them. The rest of the article has some other interesting flights of fancy.
PLoS retraction policy
PLoS declares that it will retract any paper, the conclusions of which turn out to be incorrect. I wish them luck in figuring out when something passes that threshold.
Is science growing too fast?
Some time ago, I worried that as more and more papers become published, it'll become impossible to keep up with the literature. Neuroskeptic crunches the numbers.
Around the Internet: What you missed last week (9/17/2012 edition)
Chomsky
OK, not technically last week, but here's a longish post critiquing Chomsky and a much longer, heated discussion in the comments, from BishopBlog.
Replication
A nice editorial on the important of replicating as a way of dealing with fraud.
The New Yorker Still Hates Science (esp. Evolution)
When I first heard the claim that the New Yorker was fundamentally anti-science, it came as a surprise. Then I thought back through what they publish, and it became less surprising. Now, reading this out-of-touch, anti-evolution tirade isn't surprising at all (my favorite part is where Gottlieb writes that understanding evolution is superfluous and a waste of time).
OK, not technically last week, but here's a longish post critiquing Chomsky and a much longer, heated discussion in the comments, from BishopBlog.
Replication
A nice editorial on the important of replicating as a way of dealing with fraud.
The New Yorker Still Hates Science (esp. Evolution)
When I first heard the claim that the New Yorker was fundamentally anti-science, it came as a surprise. Then I thought back through what they publish, and it became less surprising. Now, reading this out-of-touch, anti-evolution tirade isn't surprising at all (my favorite part is where Gottlieb writes that understanding evolution is superfluous and a waste of time).
Around the Internet - 8/31
Publication
A warning about the perils of preprint repositories.
Statistical evidence that writing book chapters isn't worth the effort. (Though caveat: the author also doesn't find evidence of higher citation rates for review papers in journals, which I had thought was well-established.)
One person who finds things to like in the publication process (I know, I don't link to these often).
Neuroskeptic argues that we don't necessarily want to increase replication, just replicability. (Agreed, but how do we know if replicability rates are high enough without conducting replications?)
Language
Did Chris Christie really talk about himself too much in Tampa?
Other Cognitive Science
Cognitive load disrupts implicit theory of mind processing. So maybe the reason young children succeed at implicit tasks isn't because those tasks don't require executive processing (whether they require less is still up for grabs).
Around the Internet - 7/30/2012
Citations
There have been a bunch of posts lately on citations and the Impact Factor. I started with these two posts by DrugMonkey. These posts have links to others in the chain, which you can follow. Here's a slightly older post (from late July) on reasons to self-cite.
Next topic
So I didn't actually see anything else interesting this week. Possibly because I've been trying to streamline a bootstrapping analysis (which I may blog about when I finally get it done). Early in the process, I tried to estimate how long it would take for the script to run and realized it was about 1 week for each analysis, of which I have several to do. So I started hurriedly looking for ways to speed it up...
Above average!
It's often repeated that the median study is cited less 0 times. I haven't been able to find a citation for that, but if it is true, all my papers are now above median. My birth order paper has now been cited. Actually, it was cited last year, but I didn't notice for a while. Granted, it was cited in a paper appearing in Journal of Language, Technology & Entrepreneurship in Africa, which is apparently not a high-impact journal, but a citation is a citation.
For rather boring reasons not related to the data or the review process itself, the birth order paper appeared in a journal that is not widely read by researchers, which probably has reduced its visibility. Certainly, plenty has been published on the topic in the last few years. This is a lesson for the future: it really does matter which journal you publish in, despite the wide-spread use of search engines.
For more on my birth order research, click here.
For rather boring reasons not related to the data or the review process itself, the birth order paper appeared in a journal that is not widely read by researchers, which probably has reduced its visibility. Certainly, plenty has been published on the topic in the last few years. This is a lesson for the future: it really does matter which journal you publish in, despite the wide-spread use of search engines.
For more on my birth order research, click here.
Another problem with statistical translation
In the process of writing my latest article for Scientific American Mind, I spent a lot of time testing out automatic translators like Google Translate. As I discuss in the article, these programs have gotten a lot better in recent years, but on the whole they are still not very good.
I was curious what the Italian name of one of my favorite arias meant. So I typed O Soave Fanciulla into Google Translate. Programs like Google Translate are trained by comparing bilingual documents and noting, for a given word in one language, what word typically appears in the other language in the same place. Not surprisingly, Google Translate translated O Soave Fanciulla as O Soave Fanciulla -- no doubt because it was the case that, in the bilingual corpora GT was trained on, sentences with the phrase o soave fanciulla in Italian had o suave fanciulla in English.
I was reduced to translating the words one at a time: soave -> sweet, fanciulla -> girl. GT thinks o means or, but I expect that's the wrong reading in this context ("or sweet girl"?).
I was curious what the Italian name of one of my favorite arias meant. So I typed O Soave Fanciulla into Google Translate. Programs like Google Translate are trained by comparing bilingual documents and noting, for a given word in one language, what word typically appears in the other language in the same place. Not surprisingly, Google Translate translated O Soave Fanciulla as O Soave Fanciulla -- no doubt because it was the case that, in the bilingual corpora GT was trained on, sentences with the phrase o soave fanciulla in Italian had o suave fanciulla in English.
I was reduced to translating the words one at a time: soave -> sweet, fanciulla -> girl. GT thinks o means or, but I expect that's the wrong reading in this context ("or sweet girl"?).
New Language Experiment for Bilinguals
I'm not sure I've ever blogged about a conference past the first day. I'm usually too tired by the second day. BUCLD is particularly grueling, running over 12 hours on the first day and near 12 hours on the second. Plus the parties.
I do want to point folks to one thing: Thomas Roeper, Barbara Zurer Pearson and Margaret Grace, all of the University of Massachusetts, are running an interesting study on quantifiers (words like all, some, each, and most). One interesting thing about this study is that while language researchers very often exclude non-native speakers and bilinguals, the researchers are very interested in comparing results from native and non-native speakers of English. Right now, they're looking for people who learned some language other than English prior to learning English.
The study is here. They are particularly interested right now in getting data from non-native English speakers. There is a raffle that participants can win (details are on the site).
I do want to point folks to one thing: Thomas Roeper, Barbara Zurer Pearson and Margaret Grace, all of the University of Massachusetts, are running an interesting study on quantifiers (words like all, some, each, and most). One interesting thing about this study is that while language researchers very often exclude non-native speakers and bilinguals, the researchers are very interested in comparing results from native and non-native speakers of English. Right now, they're looking for people who learned some language other than English prior to learning English.
The study is here. They are particularly interested right now in getting data from non-native English speakers. There is a raffle that participants can win (details are on the site).
Lab Notebook: Verb Resources
It's good to be studying language now, and not a few decades ago. There are a number of invaluable resources freely available on the Web.
The resource I use the most -- and without which much of my research would have been impossible -- is Martha Palmer & co.'s VerbNet, which is a meticulous semantic analysis of some several thousand English verbs. This is invaluable when choosing verbs for stimuli, as you can choose verbs that are similar to or differ from one another along particular dimensions. It's also useful for finding polysemous and nonpolysemous verbs where polysemy is defined in a very rigorous way.
Meichun Liu and her students at NCTU in Taiwan have been working on a similar project in Mandarin, Mandarin VerbNet. This resource has proved extremely valuable as I've been writing up some work I've been doing in Mandarin, and I only wish I had known about it when I constructed my stimuli.
I bring this up in case these resources are of use to anyone else. Mandarin VerbNet is particularly hard to find. I personally spent several months looking for it.
The resource I use the most -- and without which much of my research would have been impossible -- is Martha Palmer & co.'s VerbNet, which is a meticulous semantic analysis of some several thousand English verbs. This is invaluable when choosing verbs for stimuli, as you can choose verbs that are similar to or differ from one another along particular dimensions. It's also useful for finding polysemous and nonpolysemous verbs where polysemy is defined in a very rigorous way.
Meichun Liu and her students at NCTU in Taiwan have been working on a similar project in Mandarin, Mandarin VerbNet. This resource has proved extremely valuable as I've been writing up some work I've been doing in Mandarin, and I only wish I had known about it when I constructed my stimuli.
I bring this up in case these resources are of use to anyone else. Mandarin VerbNet is particularly hard to find. I personally spent several months looking for it.
Spam Filter
Blogger has helpfully added some advanced spam detection for comments. One interesting feature is that I still get an email saying a comment has been left even if the comment is flagged as spam and isn't posted. This makes it a little harder for me to moderate than you might wish.
So if your post has been flagged as spam, either be patient and wait until I discover it, or send me an email directly and I'll un-flag it.
So if your post has been flagged as spam, either be patient and wait until I discover it, or send me an email directly and I'll un-flag it.
CHARGE
I recently received an email from the CHARGE Syndrome Foundation, which is trying to provide information to people who might need it. Based on demographics, there should be a least a few readers of this blog who know somebody with CHARGE syndrome, so as a public service, I'm linking to the website and including some additional information below.
CHARGE syndrome is a relatively rare (1 per 9-10,000 births, according to the Foundation website) pattern of congenital birth defects. It usually appears in families without any history of the syndrome or similar syndromes. There are a number of physical problems (often heart defects, breathing problems, and swallowing problems) as well as nervous system problems such as malfunction of cranial nerves, blindness and deafness (the exact constellation of impairments differs from person to person).
Given the blindness and deafness, along with Autistic-like behaviors, it should not be surprising that there are consequences for language and communication. The forthcoming CHARGE Syndrome book (full disclosure: I am a co-author on one of the chapters in said book, and my father is the lead editor of the book) has a chapter (not by me) reviewing some recent work on communicative abilities in people with CHARGE. For very good reason, that work is focused on communication, rather than structural properties of language. Of course, I am interested in how or whether particular components of language are impacted by the syndrome (similar to how the linguistic consequences of Autism have been studied in some depth, telling us both more about Autism and about language), but I don't know of any relevant work having been done.
For those who want to know more about CHARGE, I suggest going to the Charge Foundation website. One place to find some of the recent research on CHARGE is to check out the publications page of the CHARGE Lab at Central Michigan University.
CHARGE syndrome is a relatively rare (1 per 9-10,000 births, according to the Foundation website) pattern of congenital birth defects. It usually appears in families without any history of the syndrome or similar syndromes. There are a number of physical problems (often heart defects, breathing problems, and swallowing problems) as well as nervous system problems such as malfunction of cranial nerves, blindness and deafness (the exact constellation of impairments differs from person to person).
Given the blindness and deafness, along with Autistic-like behaviors, it should not be surprising that there are consequences for language and communication. The forthcoming CHARGE Syndrome book (full disclosure: I am a co-author on one of the chapters in said book, and my father is the lead editor of the book) has a chapter (not by me) reviewing some recent work on communicative abilities in people with CHARGE. For very good reason, that work is focused on communication, rather than structural properties of language. Of course, I am interested in how or whether particular components of language are impacted by the syndrome (similar to how the linguistic consequences of Autism have been studied in some depth, telling us both more about Autism and about language), but I don't know of any relevant work having been done.
For those who want to know more about CHARGE, I suggest going to the Charge Foundation website. One place to find some of the recent research on CHARGE is to check out the publications page of the CHARGE Lab at Central Michigan University.
Using Google Wave
I admit I'm pretty excited about Google Wave. I am currently involved in a fairly large collaboration. It's large in
- the scale of phenomena we're trying to understand (essentially, argument realization)
- the number of experiments (literally, dozens)
- the number of people involved (two faculty, three grad students, and a rotating cast of research assistants, all spread across three universities)
Collaborative Editing?
If you are interested in Wave, the best thing is to simply check out their website or one of the many other websites describing out to use it. The main idea behind it is to enable collaborative document editing -- that is, a document that can be edited by a group of people simultaneously.
Anyone who has worked on a group project is familiar with the following problem: only one person can work on a document at a given time. For instance, if I send a paper to a co-author for editing, I can't work on the paper in the meantime or risk a real headache when trying to merge the separate edits later.
Google Docs and similar services have allowed real-time collaborative editing for a while, but although these services allow real-time collaborations, they weren't really designed for real-time collaborations. For instance, it's difficult to record who made what changes. Similarly, it doesn't allow comments (for instance, sometimes you don't want to change the text, you just want to say you don't understand it). If one person makes a change and you want to undo it, good luck. Google Wave has these and other features.
Using the Wave
Currently, we're using Wave as a collective notebook, where we record everything we've learned in the course of our research. This keeps everyone up-to-date. It also allows us to discuss issues without requiring meetings (a good thing, since we're at different universities).
For instance, recently I read a claim that a certain grammatical structure that is impossible in English happens to be possible in Japanese. I noted this in a section of our document, and attached a comment: "Yasu, Miki: can you check this?" As it happens, two members of our project are native Japanese speakers. In a series of nested comments, they discussed the issue, came to a conclusion (that the paper I had read was wrong), and then we finally deleted the comments and replaced the whole section with a summary of the discussion and conclusions.
In other sections, we've included the methods for experiments that we're designing, commenting on and ultimately editing the methods until everyone agrees.
Needed Improvements
At the moment, Wave is very much in beta testing and is underpowered. Although you can embed files and websites, there's no way to embed, say, a spreadsheet -- a major inconvenience for us, since much of our work involves making lists of verbs and their properties. Whenever I want the most updated list, I need to email whoever was working on it last, which isn't ideal.
Of course, we could use Google Docs, but it has the problems listed above (no way of commenting, no track-changes, no archive in case we decide to undo a change). It's assumed that these kinds of features will be added in the future.
Why a People Don't Panic During a Plane Crash

A lot has been made about the the crew and passengers of United Flight 1549 and their failure to panic when their plane landed in the Hudson. For instance, here is the Well blog at the New York Times:
Amanda Ripley, author of the book “The Unthinkable: Who Survives When Disaster Strikes — and Why” (Crown, 2008), notes that in this plane crash, like other major disasters, people tend to stay calm, quiet and helpful to others.On a different topic, but along the same lines, the paper's Week in Review section discusses the fact that most people are coping with the recent economic collapse reasonably well, all things considered:
“We’ve heard from people on the plane that once it crashed people were calm — the pervading sound was not screaming but silence, which is very typical ... The fear response is so evolved, it’s really going to take over in a situation like that. And it’s not in your interests to get hysterical. There’s some amount of reassurance in that I think.’’
Yet experts say that the recent spate of suicides, while undeniably sad, amounts to no more than anecdotal, personal tragedy. The vast majority of people can and sometimes do weather stinging humiliation and loss without suffering any psychological wounds, and they do it by drawing on resources which they barely know they have.Should we be surprised?
This topic has come up here before. People are remarkably bad at predicting what will make them happy or sad. Evidence shows that while many people think having children will make them happy, most people's level of happiness actually drops significantly after having children and never fully recovers even after the kids grow up. On the other end of the scale, the Week in Review article notes that
In a recently completed study of 16,000 people, tracked for much of their lives, Dr. Bonanno, along with Anthony Mancini of Columbia and Andrew Clark of the Paris School of Economics, found that some 60 percent of people whose spouse died showed no change in self-reported well-being. Among people who’d been divorced, more than 70 percent showed no change in mental health.This makes a certain amount of sense. Suppose the mafia threatens to burn down your shop if you don't pay protection money, and suppose you don't pay. They actually have very little incentive to follow through on the threat, since they don't actually want to burn down your shop -- what they want is the money. (This, according to psychology Steve Pinker, is one of the reasons people issue threats obliquely -- "That's a nice shop you have here. It'd be a shame if anything happened to it." -- so that they don't have to follow through in order to save face.)
Similarly, biology requires that we think we'll like having children in order to motivate us to have them. Biology also requires that we think our spouse dying would ruin our lives, in order to motivate us to take care of our spouse. But once we have children or our spouse dies, there is very little evolutionary benefit accrued by carrying through on the threat.
Finding the idea of a plane crash very scary: useful.
Mass panic and commotion during a crash: not so much.
Androids Run Amok at the New York Times?
I have been reading Steve Pinker's excellent essay in the New York Times about the advent of personal genetics. Reading it, though, I noticed something odd. The Times includes hyperlinks in most of its articles, usually linking to searches for key terms within its own archive. I used to think this linking was done by hand, as I do in my own posts. Lately, I think it's done by an android (and not a very smart one).
Often the links are helpful in the obvious way. Pinker mentions Kareem Abdul-Jabbar, and the Times helpfully links to a list of recent articles that mention him. Presumably this is for the people who don't know who he is (though a link to the Abdul-Jabbar Wikipedia entry might be more useful).
Some links are less obvious. In a sentence that begins "Though health and nutrition can affect stature..." the Time sticks in a hyperlink for articles related to nutrition. I guess that's in case the word stirs me into wondering what else the Times has written about nutrition. That can't explain the following sentence though:
Another kind of headache for geneticists comes from gene variants that do have large effects but that are unique to you or to some tiny fraction of humanity.
There is just no way any human thought that readers would want a list of articles from the medical section about headaches. This suggests that the Times simply has a list of keywords that are automatically tagged in every article...or perhaps it is slightly more sophisticated and the keywords vary based on the section of the paper.
I'm not sure how useful this is even in the best of circumstances. Has anyone ever actually clicked on one of these links and read any of the articles listed? If so, comment away!
(picture from Weeklyreader.com)
Building a Better Spell-Checker

Today Slate carried an interesting piece about spell-checker technology by Chris Wilson. A spell-checker typically works in the obvious way: a word you type in is compared to a dictionary. The question is where the dictionary comes from. If you use a lot of proper nouns -- or, in my case, a lot of technical jargon -- you risk the red-squiggly wrath of Microsoft Word.It's been clear to me for a while that search engines work from much larger lexicons than do word processors. The article fills in some detail as to how they do this (not surprisingly, it involves some of the sophisticated statistics that has become so important in computer approaches to language). Read the article here.
(image borrowed from eduscapes.com)
Do Bullies like Bullying?
Although Slate is my favorite magazine, and usually the first website I check each day, I've been known to complain about its science coverage, which typically lacks the insight of its other features. A much-too-rare exception to this are the occasional articles by Daniel Engber (full disclosure: I have attempted to convince Engber, a Slate editor, to run articles by me in the past, unsuccessfully).
Yesterday, he wrote an excellent piece about a recent bit of cognitive neuroscience looking at bullies and how they relate to bullying. Researchers scanned the brains of "bullies" while they viewed videos of bullying and reported that pleasure centers in the brain activated.
In a cheeky fashion typical of Slate, Engber questions the novelty of these findings:
The second half of the article is a plea for better science reporting, one that I hope is widely-read. Read it yourself here.
Yesterday, he wrote an excellent piece about a recent bit of cognitive neuroscience looking at bullies and how they relate to bullying. Researchers scanned the brains of "bullies" while they viewed videos of bullying and reported that pleasure centers in the brain activated.
In a cheeky fashion typical of Slate, Engber questions the novelty of these findings:
Bullies like bullying? I just felt a shiver run up my spine. Next we'll find out that alcoholics like alcohol. Or that overeaters like to overeat. Hey, I've got an idea for a brain-imaging study of child-molesters that'll just make your skin crawl!Obviously, I was a sympathetic reader. But Engber does not stop there:
OK, OK: Why am I wasting time on a study so lame that it got a write-up in the Onion? Hasn't this whole fMRI backlash routine gotten a bit passé?Engber goes on to detail a number of limitations to the study, including how the kids were defined as "bullies" (some appear to be rapists, for instance) and also how "pleasure center" was defined (the area in question is also related to anxiety, so one could reasonably argue bullies find bullying worrisome, not pleasurable).
The second half of the article is a plea for better science reporting, one that I hope is widely-read. Read it yourself here.
Become a Phrase Detective: A new, massive Internet-based language project
A typical speech or text does not consist of a random set of unrelated sentences. Generally, the author (or speaker) starts talking about one thing and continues talking about it for a while. While this tends to be true, there is typically nothing in the text that guarantees it:
This is a challenge both to psychologists like myself, as well as to people who try to design computer programs that can analyze text (whether for the purposes of machine translation, text summarization, or any other application).
The materials for research
A group at the University of Essex put together an entertaining new Web game called Phrase Detectives to help develop new materials for cutting-edge research into this basic problem of language. Their project is similar to my ongoing Dax Study, except that theirs is not so much an experiment as a method for developing the stimuli.
Phrase Detectives is set up as a competition between users, and the results is an entertaining game that you can participate in more or less as you choose. Other than its origins, it looks a great deal like many other Web games. The game speaks for itself and I recommend that you check it out.
What's the point?
Their Wiki provides some useful details as to the purpose of this project, but as it is geared more towards researchers than the general public, it could probably use some translation of its own. Here's my attempt at translation:
Developing and testing new algorithms requires a log of practice materials ("corpora"). Most importantly, you need to know what the correct parse (sentence diagram) is for each of your practice sentences. In other words, you need "annotated corpora."
Also, consider that when a child learns a language, that child hears or reads many, many millions of words. If it takes so many for a human who is genetically programmed to learn language, how long should it take a computer algorithm? (Computers are more advanced than humans in many areas, but in the basic areas of human competency -- vision, language, etc. -- they are still shockingly primitive.)
This is my brother John. He is very tall. He graduated from high school last year.We usually assume this is a story about a single person, who is tall, a recent high school graduate, named John, and who is brother of the speaker. But it could very well have been about three different people. Although humans are very good at telling which part of a story relates to which other part, it turns out to be very difficult to explain how we know. We just do.
This is a challenge both to psychologists like myself, as well as to people who try to design computer programs that can analyze text (whether for the purposes of machine translation, text summarization, or any other application).
The materials for research
A group at the University of Essex put together an entertaining new Web game called Phrase Detectives to help develop new materials for cutting-edge research into this basic problem of language. Their project is similar to my ongoing Dax Study, except that theirs is not so much an experiment as a method for developing the stimuli.
Phrase Detectives is set up as a competition between users, and the results is an entertaining game that you can participate in more or less as you choose. Other than its origins, it looks a great deal like many other Web games. The game speaks for itself and I recommend that you check it out.
What's the point?
Their Wiki provides some useful details as to the purpose of this project, but as it is geared more towards researchers than the general public, it could probably use some translation of its own. Here's my attempt at translation:
The ability to make progress in Computational Linguistics depends on the availability of large annotated corpora...Basically, the goal of Computational Linguistics (and the related field, Natural Language Processing) is to come up with computer algorithms that can "parse" text -- break it up into its component parts and explain how those parts relate to one another. This is like a very sophisticated version of the sentence diagramming you probably did in middle school.
Developing and testing new algorithms requires a log of practice materials ("corpora"). Most importantly, you need to know what the correct parse (sentence diagram) is for each of your practice sentences. In other words, you need "annotated corpora."
...but creating such corpora by hand annotation is very expensive and time consuming; in practice, it is unfeasible to think of annotating more that one million words.One million words may seem like a lot, but it isn't really. One of the complaints about one of the most famous word frequency corpora (the venerable Francis & Kucera) is that many important words never even appear in it. If you take a random set of 1,000,000 words, very common words like a, and, and the take up a fair chunk of that set.
Also, consider that when a child learns a language, that child hears or reads many, many millions of words. If it takes so many for a human who is genetically programmed to learn language, how long should it take a computer algorithm? (Computers are more advanced than humans in many areas, but in the basic areas of human competency -- vision, language, etc. -- they are still shockingly primitive.)
However, the success of Wikipedia and other projects shows that another approach might be possible: take advantage of the willingness of Web users to collaborate in resource creation. AnaWiki is a recently started project htat iwll develop tools to allow and encourage large numbers of volunteers over the Web to collaborate in the creation of semantically annotated corpora (in the first instance, of a corpus annotated with information about anaphora).This is, of course, what makes the Web so exciting. It took a number of years for it to become clear that the Web was not just a method of doing the same things we always did but faster and more cheaply, but actually a platform for doing things that had never even been considered before. It has had a deep impact in many areas of life -- cognitive science research being just one.
Who does Web-based experiments?
Obviously, I would prefer that people do my Web-based experiments. Having done those, though, the best place to find a list of Web-based experiments is to check the list maintained at the University of Hanover.
Who is posting experiments?
One interesting question that can be answered by this list is who exactly does experiments online. I checked the list of recent experiments posted under the category of Cognition.
From June 1st through September 12, experiments were posted by
Brown University 2
UCLA 1
University College London 2
University of Cologne 1
Duke University 2
University of London 1
Harvard University 1
University of Saskatchewan 3
University of Leeds 2
University of Minnesota 1
Who is posting experiments?
One interesting question that can be answered by this list is who exactly does experiments online. I checked the list of recent experiments posted under the category of Cognition.
From June 1st through September 12, experiments were posted by
Brown University 2
UCLA 1
University College London 2
University of Cologne 1
Duke University 2
University of London 1
Harvard University 1
University of Saskatchewan 3
University of Leeds 2
University of Minnesota 1
Subscribe to:
Comments (Atom)