Johns Hopkins center is bringing machine translation to lesser-written languages - Technical.ly Baltimore

Dev

Nov. 2, 2017 12:57 pm

Johns Hopkins center is bringing machine translation to lesser-written languages

JHU's Center for Language and Speech Processing has been on the forefront of using algorithms to translate languages. Now the American intelligence community is posing a new challenge.

Johns Hopkins University.

(Technical.ly file photo)

Johns Hopkins’ Center for Speech and Language Processing was among the research centers that helped develop translation and voice tools that have been finding wider use in commercial products in recent years. In a relatively close-knit field, the center stands out for its size and longevity.

Since the 1980s, researchers worked to develop technology that is used in tools like Google Translate, Siri or Facebook’s button that spits out a post in a different language. Such tools grew from open source systems, with the grandeur of the undertaking illustrated by Biblical names like Moses and Joshua.

“All that technology ultimately started at research labs like ours, said Phillipp Koehn. For 20 years, the computer science professor has worked on machine language translation and is affiliated with the center. He notes that the tools now in wide use weren’t a given until only recently, and have been worked on over decades.

“It’s impressive to see that it’s good enough for real users,” Koehn said in a recent interview. “That is quite a threshold.”

Most of those tools involve translation to languages that are widely-used, and feature lots of written work available in them. Google Translate, for instance, is available in the roughly 100 most prevalent languages.

Now, Koehn and other researchers are set to apply tools they’ve used in speech recognition, information retrieval and extraction of information from text to languages that aren’t as widely used.

He is leading a group of 20 that will look to develop a system that can respond to inquiries written in English of documents written in these “low resource” languages. The Office of the Director of National Intelligence awarded a $10.7 million grant for the project, which includes a mix of professors and about a dozen PhD students.

They’ll seek to start with languages like Swahili and Tagalog. Those are examples of languages that have good examples for kinds of languages that have “millions and millions of speakers…but just don’t have that much of a presence on the internet or official communication.”

Advertisement

The challenge is to take documents written in one of the languages, and produce an algorithm that would help intelligence agents get a quick look at what happened. “We have to return back to them relevant Swahili documents with a summary,” Koehn said.

After building an initial tool for the first two languages, the team will be tasked with putting it to use. For intelligence agencies, the tool could be used to quickly analyze documents in languages when a major event happens that they want to analyze. Some of the languages of interest to that end include Kurdish, Serbo-Croatian, Khmer, Hmong and Somali.

While deep learning network–oriented tools have come a long way, Koehn said there’s a new data challenge inherent in analyzing such languages. More widely-used languages often have large datasets to work with and train tools. “Now it’s much, much smaller,” Koehn said. This will require new strategies to obtain data that can be translated, whether through context or linguistic analysis.

The four-year project is the beginning of a new phase of research for the field. As Koehn noted, there are 6,000 languages in the world. The resources may not be there to translate all of them, but it means there’s plenty left to explore.

 

-30-
JOIN THE COMMUNITY, BECOME A MEMBER
Already a member? Sign in here
Connect with companies from the Technical.ly community
New call-to-action

Advertisement

UB, Loyola to cohost 2021 conference for global university biz center leaders

UMBC and UMB are joining forces to protect and probe medical data

DreamPort plans expansion of Columbia collaboration space

SPONSORED

Baltimore

Verizon is looking for the brightest ideas on how to use its 5G technology

Baltimore, MD 21201

14 West

Junior Database Administrator

Apply Now
Baltimore, MD

SmartLogic

Account Executive (Baltimore)

Apply Now
Baltimore, MD

14 West

Product Operations Manager

Apply Now

How students and employers are benefitting from the Maryland Technology Internship Program

JuliaCon is the stage for a week of programming talks — and a new Baltimore company

Maryland to receive $5.7M in settlement over massive Equifax data breach

SPONSORED

Baltimore

Escape the August heat with cool AI tech

Philadelphia OR Baltimore

Technically Media

Technical.ly Editorial Intern (Fall 2019)

Apply Now
Baltimore

Fastspot

Business Development Manager

Apply Now
Columbia, MD

Vectorworks

Python Engineer (Software Systems Development)

Apply Now

Sign-up for daily news updates from Technical.ly Baltimore

Do NOT follow this link or you will be banned from the site!