Emerging Evaluations Project AI security research

Empirical security research on emerging AI systems.

The Emerging Evaluations Project publishes reproducible evaluations of how AI tools behave under adversarial conditions, pairing hands-on technical testing with interpretive policy analysis. We report what models do — not what they are described to do.

Prompt injection RAG poisoning Agent evaluation Model safety
Fig. 1  Agent behavior across three environments — switch the regime, or explore it in the playground.
Scope

An independent publication of evidence-based data and analysis on emerging AI tools.

We study how AI systems actually behave under real conditions, and publish the methods alongside the findings.

Most accounts of AI capability and risk are anecdotal or vendor-supplied. EEP exists to replace that with measurement. We construct concrete threat models, run open and frontier systems against them, and document the results in full — including where the experiment failed or surprised us. The goal is not a verdict on whether a model is “safe,” but a precise account of which controls are actually load-bearing.

01
Empirical
Every claim is grounded in data we collected. We report the measurements, including negative results, rather than the narrative.
02
Reproducible
Harnesses, prompts, and code are published with each report so peers can re-run, challenge, and extend the work.
03
Policy-legible
Findings are mapped to established taxonomies and translated into language a decision-maker can act on.
Publications

Recent reports

Standalone, indexable summaries. Each links to the full report on Substack.

All reports
The playground

Watch the agents think.

A gallery of experiments and interactive toys where we make agent behavior visible. Currently running: an ant farm that renders AI agents as living trails.

Open the playground

Get the next report in your inbox.

Full reports, methods, and the occasional field note from the playground — published on Substack. No noise.