Software Development
Universities Month 2023

CMU’s Language Technologies Institute is applying tech to cultural preservation

This research team is developing technology in the hopes of boosting the number of languages that can be automatically translated from 200 to 2,000.

Think global. (Photo by Flickr user Kenneth Lu, used under a Creative Commons license)

While they might not all be Netflix subtitle options, there are more than 7,000 languages spoken around the world. Yet most language technologies, such as voice-to-text transcription, automatic captioning, instantaneous translation and voice recognition software, aren’t designed to recognize them. That’s why Carnegie Mellon University has a research team developing technology in the hopes that the number of languages that can be automatically translated can go from 200 to 2,000.

“A lot of people in this world speak diverse languages, but language technology tools aren’t being developed for all of them,” Xinjian Li, a member of the research team and a Ph.D. student at the university’s Language Technologies Institute (LTI), told Technical.ly. “Developing technology and a good language model for all people is one of the goals of this research.”

In the world of speech recognition, what makes the tech click is audio and text. The current roadblock is that although there is no shortage of text for the world’s languages, the audio to match it isn’t so ubiquitous. The research team plans to get over this hurdle by focusing on linguistic elements that many languages have in common, and thus create a language recognition model that relies on the linguistic elements shared between different languages. Their hope is that the end result will be a tool that can translate thousands of languages without audio.

“This is the first research to target such a large number of languages, and we’re the first team aiming to expand language tools to this scope,” Li said.

This work comes with urgency, as some languages are endangered. (Here’s a list as of 2011.) If steps aren’t taken to preserve those languages, Li said, they will disappear, and with them, key aspects of certain cultures. But automatic language recognition offers the opportunity to create a record for future generations.

“If you have speech recognition, you can transcribe some recordings of those languages and transcribe them into text and then those records will be very helpful for people to preserve those languages and preserve their cultures,” the researcher said.

While the research itself is only in its infancy — it’s only improved language tools by 5% so far — and the end result is to be determined, Li hopes this will be a launching pad of sorts for future linguists, and an asset in the quest for cultural preservation.

“Each language is a very important factor in its culture. Each language has its own story, and if you don’t try to preserve languages, those stories might be lost,” Li said. “Developing this kind of speech recognition system and this tool is a step to try to preserve those languages.”

Atiya Irvin-Mitchell is a 2022-2024 corps member for Report for America, an initiative of The Groundtruth Project that pairs young journalists with local newsrooms. This position is supported by the Heinz Endowments.

This editorial article is a part of Universities Month 2023 in Technical.ly’s editorial calendar.

Companies: Carnegie Mellon University

Before you go...

Please consider supporting Technical.ly to keep our independent journalism strong. Unlike most business-focused media outlets, we don’t have a paywall. Instead, we count on your personal and organizational support.

Our services Preferred partners The journalism fund
Engagement

Join our growing Slack community

Join 5,000 tech professionals and entrepreneurs in our community Slack today!

Trending

Why a California company chose Pittsburgh for its clean energy arm

19 tech and entrepreneurship events to check out before the holidays

EDA officials are ‘hopeful’ Tech Hubs program will live on under Trump

AI is being used in more and more of the hiring process, especially at high-volume companies

Technically Media