Empirical security research on emerging AI systems.

The Emerging Evaluations Project publishes reproducible evaluations of how AI tools behave under adversarial conditions, pairing hands-on technical testing with interpretive policy analysis. We report what models do — not what they are described to do.

Prompt injection RAG poisoning Agent evaluation Model safety

Read the reports → About the project

Fig. 1 Agent behavior across three environments — switch the regime, or explore it in the playground.

Scope

An independent publication of evidence-based data and analysis on emerging AI tools.

We study how AI systems actually behave under real conditions, and publish the methods alongside the findings.

Most accounts of AI capability and risk are anecdotal or vendor-supplied. EEP exists to replace that with measurement. We construct concrete threat models, run open and frontier systems against them, and document the results in full — including where the experiment failed or surprised us. The goal is not a verdict on whether a model is “safe,” but a precise account of which controls are actually load-bearing.

Empirical: Every claim is grounded in data we collected. We report the measurements, including negative results, rather than the narrative.
Reproducible: Harnesses, prompts, and code are published with each report so peers can re-run, challenge, and extend the work.
Policy-legible: Findings are mapped to established taxonomies and translated into language a decision-maker can act on.

Publications

Recent reports

Standalone, indexable summaries. Each links to the full report on Substack.

All reports →

published blind-sentinel

Blind Sentinel: RAG Poisoning and the Limits of Retrieval as a Security Control

An empirical study of indirect prompt injection via RAG poisoning in a SOC analyst scenario, finding that retrieval — not model alignment — is the load-bearing security control.

LLM01LLM06

Philippe · Angelica Read on Substack →

Published

Jan 15, 2025

Models tested

llama3.2:3b, mistral:7b

OWASP LLM Top-10

The playground

Watch the agents think.

A gallery of experiments and interactive toys where we make agent behavior visible. Currently running: an ant farm that renders AI agents as living trails.

Open the playground →

Recent reports

Blind Sentinel: RAG Poisoning and the Limits of Retrieval as a Security Control

Watch the agents think.

Get the next report in your inbox.