The Washington Post is partnering with Virginia Tech’s Sanghani Center for Artificial and Data Analytics to develop the new tech. It’s a generative AI project where readers can get answers to questions, using data taken from the Post’s previous coverage. The plan is for it to be built to understand intent in user questions, rather than just relying on keywords like some other AI platforms.
Project development will happen out of Virginia Tech’s Innovation Campus, but the physical space isn’t set to open until spring 2025 in Alexandria. For now, students and professors are working out of facilities in Arlington and Falls Church.
The partnership stemmed from a desire for the Post to be leaders in the new ways people are finding and consuming information, Sam Han told Technical.ly
Han is head of data and AI at the paper. He’s been at the Post for about seven years, and in his current role for the past three.
“People are getting used to asking questions, [getting] answers directly, instead of them reading and understanding,” Han said. “That’s the trend we are observing. And we want to be in that transformation — or, in a way, revolution — to lead as a media technology company. We want to prepare ourselves technically so that we can provide the best media experience to readers.”
The tech will consider implicit assumptions, and context. Han gave the example of someone asking who won the Super Bowl: Usually, they are asking about the most recent championship, not past years.
For asks like these, among others, a technique called retrieval-augmented generation (RAG) will be used to provide responses that are more likely to actually answer someone’s question. Using RAG lets a generative AI system access new information beyond its initial training data — in this case, the paper’s up-to-date coverage, Han explained.
“The goal is to build up technology assets for us in this new world”Sam Han Washington Post
The Post will also employ multimodal large language model (LLM) technology, meaning the AI tool won’t just pull from text, but also be able to integrate information found in audio or video reporting products.
The New York Times is suing OpenAI and Microsoft for copyright infringement, claiming that millions of articles were used to build the AI models. In August 2023, the paper blocked OpenAI from being able to scrape its content to train models. BBC, CNN and Reuters followed.
In May 2023, Fred Ryan, the previous CEO and publisher at the Post, announced in a press release that AI was a “priority opportunity.” At that same time, the Post established an AI Task Force and AI Hub, the latter being led by Han.
At the moment, there is no specific timeline of when readers can expect to see the feature, Han said. Two PhD students have started a yearlong research and development effort to build the tool’s search abilities, with three Virginia Tech faculty members supervising
The partnership will provide “one-of-a-kind educational experiences for our students,” according to Naren Ramakrishnan, director of the Sanghani Center, since it provides an opportunity to work on a real-world project with exacting demands.
It’ll also allow the Post to stay on top of the latest AI trends.
“The goal is to build up technology assets for us in this new world,” Han said, “where large language model AI plays a critical role of providing conversational information consumption.”
Before you go...
Please consider supporting Technical.ly to keep our independent journalism strong. Unlike most business-focused media outlets, we don’t have a paywall. Instead, we count on your personal and organizational support.
Join our growing Slack community
Join 5,000 tech professionals and entrepreneurs in our community Slack today!