Field of Science

Findings: Which English -- updated dialect chart

I have updated the dialect chart based on the results for the first few days. Since the new version shows up automatically in the frame in the previous post, I haven't added it in here. And you can get a better look at it on the website.

The biggest difference is that also added several "dialects" for non-native speakers of English. That is, I added five new dialects, one each for people whose first language was Spanish, German, Portuguese, Dutch, or Finnish. I'll be adding more of these dialects in the future, but those just happen to be the groups for which we have a decent number of respondents.

As you can see, the algorithm finds that American & Canadian speakers are more likely one another than they are like anyone else. Similarly, English, Irish, Scottish, and Australian speakers are more likely one another than anyone else. And the non-native English speakers also form a group. I'll leave you to explore the more fine-grained groupings on your own.

If you are wondering why New Zealanders are off by themselves, that's mostly because we don't have very many of them, and the algorithm has difficulty classifying dialects for which there isn't much data. Same for Welsh English, South African English, and Black Vernacular English. So if you know people who speak any of those dialects...

5 comments:

Katariina said...

Couldn't find any other place to give feedback, so here it goes: After the quiz when you have to choose areas you lived in etc. should it be "at least 10 months" instead of "years"? Just wondering...

BTW, I found the test very interesting! Kudos!

Spencer Olson said...

This probably doesn't matter too much, but I accidentally input my age as 19 instead of 20 when I took the quiz several minutes ago. Is there some way to correct that?

aktoetotam said...

I wonder if you have any paper published about this project (or going to). I would love to find out more about your machine learning approach. For example, what features you use for it, the size of the training corpus by now, how many languages you can classify.

GamesWithWords said...

@Katariina -- yes, we really did mean 10 years. We ask about any country you lived in for 6 months. For some countries, we ask what part of the country you lived in. In that case, we only care if you were in that part of that country for a long time.

@Spencer -- obviously it is better to be correct, but this probably isn't a big deal. Asking for age in years is fairly imprecise anyway.

@aktoetam -- we just launched this project 2 weeks ago, so there aren't any papers yet! We have a number of ways to inform people when papers are published, results are released, and so on. You can read about them here.

NV Rees said...

It is difficult to take this research seriously when
(1) Your web-page itself contains grammatical errors, such as "their" (which is plural) being used with "the participant's" (which is a singular possessive).
(2) You consider only geographical region and not social class. For example, you'll find accentless English (RP) amongst many white-collar Londoners which is quite at odds with the "Gor' blimey" speech of many blue-collar Londoners, all of which will have spoken their flavour of English from birth.