Discriminating Non-Native English with 350 Words

We were named co-winner of the Native Language Identification shared task at NAACL’s 2013 BEA-8 workshop! The task was to identify an author’s native language based on a short English essay. Our system was 83% accurate when reading, on average, 348 words of English and selecting a native language from the set of Arabic, Chinese, French, German, Hindi, Italian, Japanese, Korean, Spanish, Telugu, and Turkish. We spent 3 weeks developing our submission, and our result was statistically tied for first place among the 66 submissions from 29 teams.

More info at: http://www.nlisharedtask2013.org/, paper to be published in June 2013.

Comments are closed.