About

The Emerging Evaluations Project

An independent publication of empirical research on the security of emerging artificial intelligence systems.

Mission

What we are

EEP publishes evidence-based data and analysis of emerging AI tools. Our work pairs hands-on technical expertise with interpretive policy: we measure how models actually behave under realistic conditions, and we explain what those measurements mean for the people who have to make decisions about deployment.

We are dedicated to the thorough exploration of models and their safety. That means practical, repetitive investigation rather than one-off demonstrations — the unglamorous work of running an experiment again, varying one factor at a time, and reporting the result whether or not it confirms the hypothesis. We believe holistic security is built this way, and that the methods matter as much as the conclusions.

Full reports are hosted on our Substack; this site carries standalone summaries and the occasional interactive visualization.

Research areas

What we study

Prompt injection & indirect injection

Direct and content-borne instruction attacks against LLM-backed systems, and the conditions under which they succeed.

LLM01

Retrieval security & RAG poisoning

Whether retrieval pipelines act as a security control or an attack surface — and which layer is actually load-bearing.

LLM01LLM06

Agent evaluation

How autonomous and tool-using agents behave across cooperative, constrained, and adversarial settings.

Evaluation methodology

Harness design, scoring, and the failure modes of evaluations themselves — including when a harness rewards the appearance of correctness.

Method

How we work

Empirical first
Every claim is anchored in data we collected. We run real models — open and frontier — against concrete threat models, and report the measurements, including negative and null results.
Reproducible by design
Harnesses, prompts, and code are published alongside findings so peers can re-run, challenge, and extend them. A result that cannot be reproduced is a hypothesis, not a finding.
Taxonomy-aligned
Findings are mapped to the OWASP LLM Top-10 and MITRE ATLAS so they slot directly into the frameworks practitioners already use.
Policy-legible
Technical results are translated into interpretive policy — language a decision-maker can act on without a model card in hand.
Taxonomy

How findings are classified

Each report is tagged against two established taxonomies, so a reader can connect a specific result to the broader threat landscape and to their own controls.

OWASP LLM Top-10

The community-standard catalogue of the most critical vulnerabilities in LLM applications.

LLM01 Prompt Injection LLM06 Sensitive Disclosure

MITRE ATLAS

A knowledge base of adversarial tactics and techniques against AI-enabled systems.

Reconnaissance ML Attack Staging
People

Who we are

A small group doing careful, repeatable work.

P

Philippe

Research
A

Angelica

Research

Follow the research, or get in touch.

New reports publish on Substack. For collaboration, data requests, or to flag an error in our work, reach us there.