Abstract
LACUNA is a programming model that enables LLM agents to write code that shapes the runtime while maintaining safety through type checking and controlled execution.
LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes. The runtime owns the loop, context, and control flow, and the model has little say over any of them. Letting model-written code shape the runtime itself would make agents more expressive, but it would also sharpen safety problems. A model can be diverted by a prompt injection, call the wrong tool, or fail partway and leave an inconsistent state, and each such failure reaches further when the code shapes the runtime than when it expresses a single action. We present LACUNA, a programming model for agents that closes this split while preserving safety. Each agent action is a typed call agent[T](task) that the LLM fills with code when execution reaches it, and the code is type-checked against the surrounding program before it runs. Because each action is accepted or rejected as a whole, a rejected one leaves the environment untouched, and its compiler diagnostics drive a retry. The same check also bounds which tools and data an action may use and how they flow. Our primitive expresses ReAct loops, sub-agents, skills, parallel decomposition, and multi-model planning as ordinary control flow. We evaluate LACUNA on a collection of test cases, BrowseComp-Plus, and τ^2-bench. On BrowseComp-Plus, 8.6% of generations are rejected before execution, with 0.7 retries per query on average, and the agent reaches 27.1% accuracy. On τ^2-bench, LACUNA solves 76.0% of 392 tasks across four domains with a capable model, on par with the baseline agent.
Community
LLM agents increasingly act by writing code, yet a split persists between the runtime that drives the agent and the code the model writes. This paper proposes LACUNA, a programming model that fuses the agent program and its behaviour. Each agent action is a typed call agent[T](task) that the LLM fills with code when execution reaches it, and the code is type-checked against the surrounding program before it runs. Because each action is accepted or rejected as a whole, a rejected one leaves the environment untouched, and its compiler diagnostics drive a retry.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Language-Based Agent Control (2026)
- Synthesizing Multi-Agent Harnesses for Vulnerability Discovery (2026)
- Shepherd: A Runtime Substrate Empowering Meta-Agents with a Formalized Execution Trace (2026)
- MOSS: Self-Evolution through Source-Level Rewriting in Autonomous Agent Systems (2026)
- Formal Skill: Programmable Runtime Skills for Efficient and Accurate LLM Agents (2026)
- KAIJU: An Executive Kernel for Intent-Gated Execution of LLM Agents (2026)
- ChainCaps: Composition-Safe Tool-Using Agents via Monotonic Capability Attenuation (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2605.28617 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper