Open-source software developed by computer scientists at the NRC wins international contest
February 11, 2019 — Ottawa, Ontario
Most Canadians have seen bad machine translations based on word-for-word equivalents that can lead to confusion and can sometimes be quite funny. But how do you tell a machine it might not be getting 100% on its latest translation assignment, and what it needs to improve?
Meet YiSi, a new machine translation teacher developed by the National Research Council of Canada. YiSi is an open-source software that examines sentences produced by machine translation and compares them against the original text or a human reference translation. YiSi assigns an accuracy score from 0 to 100 to each translated sentence, pinpointing problems in translation for developers to improve in the translation system.
Research officer Jackie Lo came up with the idea behind YiSi—using databases that map out the relationships between words to score machine translation—and developed YiSi’s prototype code in 2017. She then worked closely with software development specialist Darlene Stewart, who made sure YiSi tells the user nicely how to launch evaluation tasks, does not quit when the user provides the wrong database for a particular task, and warns them gracefully when they make a mistake.
You might be wondering who checks YiSi’s work and how we know its scores are accurate. YiSi competes against other automatic translation teachers at international competitions, such as the Metric Shared Task in the Third Conference on Machine Translation (known as WMT). In 2018, after grading over 400,000 translated sentences, which humans also graded for reference, YiSi came first in evaluating meaning for Turkish-to-English, English-to-Russian, and English-to-Turkish sentence outputs, and was also the best overall performer across all 14 language pairs.
What’s next for Jackie, Darlene, and YiSi? In 2019, the trio plans to apply YiSi to collaborative projects with NRC clients, integrate YiSi into machine translation system development toolkits to further promote its use, and enter the WMT 2019 metrics competition. In the meantime, machine translation developers interested in giving YiSi a try can contact the NRC’s Digital Technologies Research Centre.
“Since joining the National Research Council of Canada in 2015, I’ve enjoyed the creativity and freedom of working on projects like YiSi. I hope developers of machine translation systems will discover this new tool and share their feedback with our team.”
— Jackie Lo, Research Officer in Multilingual Text Processing, National Research Council of Canada
“On February 11, the International Day of Women and Girls in Science, the National Research Council of Canada is honoured to recognize the accomplishments of its women researchers, such as Jackie and Darlene’s outstanding work on YiSi. Through transformative policy and action, we can contribute to women and girls achieving full and equal access to, and participation in, all areas of science, including computer science.”
— Geneviève Tanguay, Vice-President of Emerging Technologies, National Research Council of Canada
Named after the Cantonese word for meaning, YiSi is powered by a massive database of word embeddings: sequences of numbers that map out the ‘distance’ or relationships of meaning between words, taking into account the nature of words and how they work in sentences.
To generate the 400,000 sentences that YiSi scores during contests, several systems developed by other research teams translate about 3,000 original sentences to and from English: Czech, German, Estonian, Finnish, Russian, Turkish and Chinese.
The National Research Council of Canada’s Digital Technologies Research Centre conducts research that makes sense of data and creates value from information. Its experts specialize in advanced analytics, computer vision, natural language processing, and artificial intelligence.