First-Year Student Presented Paper at Prestigious Computational Linguistics Conference
Even before receiving an offer of admission to Georgetown, first-year student Aryaman Arora (C’24) was making waves as a budding computational linguist.
Working with Department of Linguistics Ph.D. student Luke Gessler (G’23) and Department of Computer Science and linguistics professor Nathan Schneider, Arora has developed a better way for machines to transcribe Hindi and Punjabi writing for pronunciation by text-to-speech software.
An article describing this innovation was presented last month at the 58th Annual Meeting of the Association for Computational Linguistics (ACL).
Because of the pandemic, the conference took place virtually. Arora recorded a video presenting the findings of their research and participated in live Q&A sessions via Zoom.
An Early Research Passion
Born in India, Arora was raised bilingually in Hindi and English, and his family immigrated to the United States when he was 5 years old. In high school, conscious that his knowledge of Hindi was fading, Arora decided to make a concerted effort to regain those skills. His interest in the language brought him to Wiktionary, the free multilingual online dictionary, which he later began to edit in Hindi.
He eventually became a site administrator, improving coverage of a range of South Asian languages as well as writing code to support the dictionary.
The experience sparked his curiosity about the intersection between computer science and linguistics, and in the summer before his senior year of high school, the Washington, DC resident reached out to Schneider about research opportunities at Georgetown.
“I get a lot of requests from our talented local high school students, but Aryaman’s really stood out because he had a lot of experience that was a good fit for the research I do,” Schneider says.
The professor, whose research addresses the ability of algorithms to learn the nuance of languages, was impressed by Arora’s knowledge of Hindi and participation in the North American Computational Linguistics Olympiad (NACLO).
Improving Text-to-Speech
There are a number of languages for which text-to-speech software converts on-screen text to sound, powering virtual assistants such as Alexa from Amazon, or which can be used by people with visual impairments. Each language and writing system brings a unique set of challenges for text-to-speech technologies.
The “schwa”, the uh vowel sound in English words like about and um, is placed by default after every consonant in written Hindi. However, spoken Hindi only pronounces some of these schwas. The rules governing the pronunciation of the schwa are not clear.
“Just like in English, in Hindi the way a word is spelled is not always exactly how the word sounds,” explains Arora. “If a text-to-speech software does not model schwa deletion well, the best-case scenario is that the word will sound strange to a native speaker. In the worst-case scenario, the word will change entirely, which makes the text-to-speech incomprehensible.”
The model that Arora, Gessler, and Schneider created fixes the issue of schwa deletion in Hindi and has the highest accuracy of any system so far at 98%. The model solves a core problem that could then be used in creating end user systems in the future.
“Linguists have been trying to explain the phenomenon of schwa deletion for at least the past 50 years, but no one single explanation was ever entirely satisfactory,” says Gessler. “That is why it is very exciting that we have been able to produce such a good solution for real-world applications.”
Arora, Gessler, and Schneider submitted their research paper to the ACL conference in December of 2019, where it underwent peer review and was published and accepted for presentation.
Taking It Further
Arora was accepted into Georgetown College earlier this year and will double major in computer science and linguistics. He hopes to continue working with Schneider on computational research for Hindi and would ultimately like to pursue a Ph.D. in the field.
The young researcher says this type of research can improve cross-cultural understanding. Schneider adds that this is especially relevant now that most people interact with AI.
“The internet and technology are becoming a core part of our lives and that extends to how we interact with other people,” he adds. “It is also important to think about the equity issues and potential biases that arise when building this technology, so it is exciting to see some work on languages other than the most affluent and well-studied languages such as English.”
Schneider, who frequently mentors students, says that “it’s an absolute treat to work with Georgetown students. Aryaman is motivated, bright and excited about cool new research and learning.”
-by Shelby Roller (G’19)