Software Development
Technology / Universities

CMU’s Language Technologies Institute is applying tech to cultural preservation

This research team is developing technology in the hopes of boosting the number of languages that can be automatically translated from 200 to 2,000.

Think global. (Photo by Flickr user Kenneth Lu, used under a Creative Commons license)

This editorial article is a part of Universities Month 2023 in Technical.ly’s editorial calendar.

While they might not all be Netflix subtitle options, there are more than 7,000 languages spoken around the world. Yet most language technologies, such as voice-to-text transcription, automatic captioning, instantaneous translation and voice recognition software, aren’t designed to recognize them. That’s why Carnegie Mellon University has a research team developing technology in the hopes that the number of languages that can be automatically translated can go from 200 to 2,000.

“A lot of people in this world speak diverse languages, but language technology tools aren’t being developed for all of them,” Xinjian Li, a member of the research team and a Ph.D. student at the university’s Language Technologies Institute (LTI), told Technical.ly. “Developing technology and a good language model for all people is one of the goals of this research.”

In the world of speech recognition, what makes the tech click is audio and text. The current roadblock is that although there is no shortage of text for the world’s languages, the audio to match it isn’t so ubiquitous. The research team plans to get over this hurdle by focusing on linguistic elements that many languages have in common, and thus create a language recognition model that relies on the linguistic elements shared between different languages. Their hope is that the end result will be a tool that can translate thousands of languages without audio.

“This is the first research to target such a large number of languages, and we’re the first team aiming to expand language tools to this scope,” Li said.

This work comes with urgency, as some languages are endangered. (Here’s a list as of 2011.) If steps aren’t taken to preserve those languages, Li said, they will disappear, and with them, key aspects of certain cultures. But automatic language recognition offers the opportunity to create a record for future generations.

“If you have speech recognition, you can transcribe some recordings of those languages and transcribe them into text and then those records will be very helpful for people to preserve those languages and preserve their cultures,” the researcher said.

While the research itself is only in its infancy — it’s only improved language tools by 5% so far — and the end result is to be determined, Li hopes this will be a launching pad of sorts for future linguists, and an asset in the quest for cultural preservation.

“Each language is a very important factor in its culture. Each language has its own story, and if you don’t try to preserve languages, those stories might be lost,” Li said. “Developing this kind of speech recognition system and this tool is a step to try to preserve those languages.”

Atiya Irvin-Mitchell is a 2022-2024 corps member for Report for America, an initiative of The Groundtruth Project that pairs young journalists with local newsrooms. This position is supported by the Heinz Endowments.
Companies: Carnegie Mellon University
Series: Universities Month 2023

Before you go...

Please consider supporting Technical.ly to keep our independent journalism strong. Unlike most business-focused media outlets, we don’t have a paywall. Instead, we count on your personal and organizational support.

3 ways to support our work:
  • Contribute to the Journalism Fund. Charitable giving ensures our information remains free and accessible for residents to discover workforce programs and entrepreneurship pathways. This includes philanthropic grants and individual tax-deductible donations from readers like you.
  • Use our Preferred Partners. Our directory of vetted providers offers high-quality recommendations for services our readers need, and each referral supports our journalism.
  • Use our services. If you need entrepreneurs and tech leaders to buy your services, are seeking technologists to hire or want more professionals to know about your ecosystem, Technical.ly has the biggest and most engaged audience in the mid-Atlantic. We help companies tell their stories and answer big questions to meet and serve our community.
The journalism fund Preferred partners Our services
Engagement

Join our growing Slack community

Join 5,000 tech professionals and entrepreneurs in our community Slack today!

Trending

The Trump rally shooter perched on a building owned by American Glass Research. Here’s everything we know about it.

Quantum computing could be the next hot tech — if only that breakthrough would come

Here’s how the global tech outage impacted many of the vital systems across the mid-Atlantic region

Ready to start marketing your startup? 3 crucial questions all founders should ask

Technically Media