Software Development

The US government is launching a new project to test and assess AI

The National Institute of Standards and Technology will evaluate LLM applications along three pillars: model testing, red-teaming and field testing.

NIST's Advanced Measurement Laboratory building. Gail Porter/NIST

The federal government, in its effort to better understand the implications and effects of artificial intelligence, has a new project to test potential pitfalls.  

The National Institute of Standards and Technology (NIST), a division of the Department of Commerce, launched this program at the end of May to test different large language model (LLM) applications. The Assessing Risks and Impacts of AI (ARIA) initiative aims to understand how AI uses will look in different scenarios, as well as the reliability and safety of operating the technology. 

“ARIA serves as a forum to experiment with novel measurement methods and to provide insights into the functionality and trustworthiness of AI algorithms,” Elham Tabassi, NIST’s chief AI advisor and a member of the inaugural TIME100 AI list, said in an email. “To that end, ARIA aims to answer questions such as, do the systems perform as expected within their context of use?”

This program is part of NIST’s AI Risk Management Framework, which was released in January 2023 and aims to help organizations identify and understand risks with generative AI. NIST also launched an AI Safety Institute Consortium in February, made up of more than 200 companies and organizations that will help develop guidelines for AI safety. 

ARIA will serve as a companion to the risk framework, with the goal being to measure how well a system functions in different contexts, Tabassi said. These moves come after President Joe Biden’s executive order from October focusing on AI safety and security.

As part of this project, developers will create different applications that look like a typical chatbot. Those models will be tested based on three major criteria: model testing, red-teaming and field testing. Researchers will see if the application works as it is claimed to, evaluate its safeguards by attempting to make the model operate in a way that is not intended, and evaluate the models’ positive and negative impacts based on its intended use. 

It’s focused on a “socio-technical” approach, meaning that the program will evaluate how people interact with the technology rather than simply testing for accuracy, Tabassi said. 

About 20 people, including social scientists, cognitive scientists, computer scientists, mathematicians, data scientists and statisticians, will contribute to the project. This includes federal staff, associates and students. Nobody works on the project full-time.

This pilot evaluation will run this summer through the early fall. A “full ARIA test” is planned for 2025, and the exact date will be determined by what comes out of this test phase. After this pilot, the aim is for ARIA evaluations to expand to cover additional uses and types of AI besides LLMs, Tabassi said. 

Overall, she sees building more confidence in AI applications as the end goal. 

“ARIA will increase public understanding and awareness about what are the impacts of AI systems,” Tabassi said, “who is impacted by AI systems and what is the magnitude of the impact.”

Companies: National Institute of Standards and Technology

Before you go...

Please consider supporting Technical.ly to keep our independent journalism strong. Unlike most business-focused media outlets, we don’t have a paywall. Instead, we count on your personal and organizational support.

Our services Preferred partners The journalism fund
Engagement

Join our growing Slack community

Join 5,000 tech professionals and entrepreneurs in our community Slack today!

Trending

How 5 orgs help local businesses achieve success

19 tech and entrepreneurship events to check out before the holidays

DC’s year in tech: An interactive timeline for 2024

Technically Media