Johns Hopkins center is bringing machine translation to lesser-written languages - Technical.ly Baltimore

Dev

Nov. 2, 2017 12:57 pm

Johns Hopkins center is bringing machine translation to lesser-written languages

JHU's Center for Language and Speech Processing has been on the forefront of using algorithms to translate languages. Now the American intelligence community is posing a new challenge.
Johns Hopkins University.

Johns Hopkins University.

(Technical.ly file photo)

Johns Hopkins’ Center for Speech and Language Processing was among the research centers that helped develop translation and voice tools that have been finding wider use in commercial products in recent years. In a relatively close-knit field, the center stands out for its size and longevity.

Since the 1980s, researchers worked to develop technology that is used in tools like Google Translate, Siri or Facebook’s button that spits out a post in a different language. Such tools grew from open source systems, with the grandeur of the undertaking illustrated by Biblical names like Moses and Joshua.

“All that technology ultimately started at research labs like ours, said Phillipp Koehn. For 20 years, the computer science professor has worked on machine language translation and is affiliated with the center. He notes that the tools now in wide use weren’t a given until only recently, and have been worked on over decades.

“It’s impressive to see that it’s good enough for real users,” Koehn said in a recent interview. “That is quite a threshold.”

Most of those tools involve translation to languages that are widely-used, and feature lots of written work available in them. Google Translate, for instance, is available in the roughly 100 most prevalent languages.

Now, Koehn and other researchers are set to apply tools they’ve used in speech recognition, information retrieval and extraction of information from text to languages that aren’t as widely used.

He is leading a group of 20 that will look to develop a system that can respond to inquiries written in English of documents written in these “low resource” languages. The Office of the Director of National Intelligence awarded a $10.7 million grant for the project, which includes a mix of professors and about a dozen PhD students.

They’ll seek to start with languages like Swahili and Tagalog. Those are examples of languages that have good examples for kinds of languages that have “millions and millions of speakers…but just don’t have that much of a presence on the internet or official communication.”

Advertisement

The challenge is to take documents written in one of the languages, and produce an algorithm that would help intelligence agents get a quick look at what happened. “We have to return back to them relevant Swahili documents with a summary,” Koehn said.

After building an initial tool for the first two languages, the team will be tasked with putting it to use. For intelligence agencies, the tool could be used to quickly analyze documents in languages when a major event happens that they want to analyze. Some of the languages of interest to that end include Kurdish, Serbo-Croatian, Khmer, Hmong and Somali.

While deep learning network–oriented tools have come a long way, Koehn said there’s a new data challenge inherent in analyzing such languages. More widely-used languages often have large datasets to work with and train tools. “Now it’s much, much smaller,” Koehn said. This will require new strategies to obtain data that can be translated, whether through context or linguistic analysis.

The four-year project is the beginning of a new phase of research for the field. As Koehn noted, there are 6,000 languages in the world. The resources may not be there to translate all of them, but it means there’s plenty left to explore.

 

-30-
CONTRIBUTE TO THE
JOURNALISM FUND

Already a contributor? Sign in here
Connect with companies from the Technical.ly community
New call-to-action

Advertisement

UMB licenses technology to Y Combinator-backed biotech startup

This earmuff-style technology looks to make life less ruff for military working dogs

This apprenticeship is designed to bring gov contractor biz dev to smaller firms

SPONSORED

Baltimore

How independence fuels confidence and professional development at SmartLogic

Baltimore

SmartLogic

Operations Manager

Apply Now

Philadelphia, PA

Vistar Media

Software Engineer

Apply Now

Baltimore, MD

SmartLogic

Chief of Staff

Apply Now

5 student startups that pitched at this Johns Hopkins accelerator’s first demo day

UMBC’s HARP successfully launched to the International Space Station

6 student-led startups we met at Innov8MD’s Baltimore and Beyond Conference

SPONSORED

Baltimore

How this lawyer is helping entrepreneurs bark up the right tree

Philadelphia, PA - Center City

Odessa

Sr. Business Analyst, ERP Implementations

Apply Now

Baltimore, MD

SmartLogic

Developer

Apply Now

Philadelphia, PA

URBN

IT Recruiter

Apply Now

Sign-up for daily news updates from Technical.ly Baltimore

Do NOT follow this link or you will be banned from the site!