Papers
arxiv:2607.00871

Self-Evolving Agents with Anytime-Valid Certificates

Published on Jul 1
Authors:

Abstract

SEA is an architecture that limits self-modification in agents by freezing the base model and using verification mechanisms to ensure stable performance improvements.

Self-evolving agents violate the assumption behind most learning-theoretic guarantees: the data, evaluator, components, and hypothesis space are produced by the policy being updated. We present SEA, an architecture that confines self-modification to a small steering adapter and a versioned harness around a frozen base model and admits each modification only through an anytime-valid gate that emits an auditable certificate against a fixed error budget. Five loop controllers compose published guarantees; because such gates can only select among behaviors the frozen base already produces, five verifier-in-the-loop mechanisms -- best-of-N, micro-step search, self-authored reproduction oracles, search-layer control, and self-repair -- supply the dense, grader-free signal the gates require, computed from the issue text alone. On a 52-instance SWE-bench Verified subset across four base models, base capability is the dominant, confound-free effect, and on two strong base models a deliberate no-op-composite control isolates the suite's contribution at +4 and +5 (Glm~5.2 24to28; Gpt 29to34, the 65% best), with event logs confirming that its mechanisms fire and prevent regressions. Results are single-run on expensive evaluations; confirming run-to-run variance and adapting the per-task algorithm mix are future work.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2607.00871
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2607.00871 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2607.00871 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.