Preliminary Taxonomy of AI-Bio Safety Evaluations

FMF · Issue Brief

Type

Report

classification

Source

FMF

publisher

Published

2024

December 20, 2024

Series

Issue Brief

document class

Pages

source PDF

Words

771

full text on file

Topics

2

tagged subjects

Full text

On file

readable here

biosecurityevaluations

Abstract

Frontier AI-bio safety evaluations aim to test the biological capabilities and potential biosafety implications of frontier AI. As this field remains nascent, evaluations vary significantly in purpose and methodology. A shared understanding of safety evaluation functions and types is essential for building an effective evaluation ecosystem in the AI-bio space.

Full text

Preliminary Taxonomy of AI-Bio Safety Evaluations

Introduction

Frontier AI-bio safety evaluations aim to test the biological capabilities and potential biosafety implications of frontier AI. As this field remains nascent, evaluations vary significantly in purpose and methodology. A shared understanding of safety evaluation functions and types is essential for building an effective evaluation ecosystem in the AI-bio space.

This issue brief presents an initial taxonomy and definitions for frontier AI safety evaluations specific to the biological domain, organized across two dimensions: methodology and domain. Drawing from Frontier Model Forum member expertise and external specialists in advanced AI and biological research, this document seeks to establish preliminary consensus on current understanding of frontier AI-bio safety evaluations.

Evaluation Methods

The first categorization dimension is methodology, which describes how frontier AI models or systems are evaluated through study design.

Benchmark Evaluations: These comprise safety-relevant questions or tasks designed to assess model capabilities with comparable results across different models. They provide baseline indicators of general or domain-specific capabilities. Benchmarks are "easily repeatable and are typically automated, though grading can also incorporate expert human grading." In biology, benchmarks may include knowledge assessments (multiple choice questions, open-ended questions), capability benchmarks (agentic tests), or safeguard evaluations (refusal testing).

Red-Team Exercises: These are "dynamic, adversarial, and interactive evaluations meant to elicit specific information about the harmful capabilities of a particular model or system." Typically conducted by human experts, they simulate potential attacks or misuse scenarios while measuring residual risk. Though automated red-teaming is under development, human experts remain central to these exercises, emphasizing "the dynamic nature of interaction between the human experts and the model."

Controlled Studies: These assessments measure how advanced AI models or systems impact human capability to achieve specific real-world tasks compared to alternative resources or existing tools. They can be in-silico or wet lab-based, employing randomized controlled trial designs to establish counterfactual impacts. These evaluations assess "the extent to which advanced AI models/systems impact the capability of human actors to achieve a particular task."

Evaluation Domains

The second dimension covers evaluation domains, which specify the particular capability, modality, or outcome under assessment. Evaluations frequently address multiple domains simultaneously. Most biosafety evaluations (except safeguard assessments) occur after base model development but before mitigation implementation.

Scientific Knowledge Evaluations: These assess whether AI models possess general scientific knowledge, including biological facts and concepts.

Scientific Reasoning Evaluations: These evaluate whether models can perform "complex, multi-step research and reasoning tasks needed to advance scientific knowledge, especially knowledge relevant to biological research." Assessments include literature review production, graphic interpretation, and other research-critical skills.

Hazardous Biological Knowledge Evaluations: These determine whether models can provide detailed, domain-specific knowledge necessary for steps in biological threat creation. They test both direct knowledge for executing particular steps and tacit knowledge for troubleshooting. These evaluations address several operational phases:

Ideation: Whether models provide knowledge helping actors generate or assess bioweapons development ideas, including historical bioweapons information and enhanced pandemic pathogen research.

Design: Whether models provide sensitive knowledge assisting in designing novel or enhanced biological threat agents, including biological design tool usage and in-silico experiment troubleshooting.

Acquisition: Whether models help actors acquire materials and equipment for biological threats, including cloud lab contracting knowledge, DNA synthesis obscuring, export control evasion, and hazardous DNA sequence retrieval.

Build: Whether models assist in developing biological weapons, including culturing agents for weaponizable quantities, formulating and stabilizing agents for release, and producing novel pathogens.

Release: Whether models provide knowledge for planning agent release against target populations, including aerosolization techniques and transmission mechanism targeting.

Amplify: Whether models can magnify harmful attack results through complementary social engineering campaigns that increase societal impact without altering physical impact.

Automated Processes Evaluations: These assess whether models can directly automate or outsource biological research or weapons development processes. While often conducted in virtual environments, they may evaluate physical-world capabilities. The critical distinction is evaluating "a model or system's ability to independently carry out steps necessary for biological threat creation."

Safeguard Evaluations: Some domain assessments function as safeguard evaluations, assessing harmful knowledge or capabilities after safety mitigations and guardrails are implemented. These may evaluate whether systems continue outputting hazardous biological information despite added safeguards in specific deployment scenarios.

Conclusion

Developing shared understanding of AI-bio safety evaluation design and implementation represents an important initial step in managing frontier AI risks. This brief outlined evaluation methods and domains within the current AI-bio safety evaluation landscape.

The presented taxonomy reflects current AI-bio safety evaluations for language models. As technology evolves, additional evaluation types outside these categories will likely emerge. The Frontier Model Forum remains committed to advancing this field, with upcoming work involving mapping specific threat models onto this taxonomy to identify existing methodological gaps.