hermes / website /docs /developer-guide /agent-loop.md
lenson78's picture
initial upload: v2026.3.23 with HF Spaces deployment
9aa5185 verified
---
sidebar_position: 3
title: "Agent Loop Internals"
description: "Detailed walkthrough of AIAgent execution, API modes, tools, callbacks, and fallback behavior"
---
# Agent Loop Internals
The core orchestration engine is `run_agent.py`'s `AIAgent`.
## Core responsibilities
`AIAgent` is responsible for:
- assembling the effective prompt and tool schemas
- selecting the correct provider/API mode
- making interruptible model calls
- executing tool calls (sequentially or concurrently)
- maintaining session history
- handling compression, retries, and fallback models
## API modes
Hermes currently supports three API execution modes:
| API mode | Used for |
|----------|----------|
| `chat_completions` | OpenAI-compatible chat endpoints, including OpenRouter and most custom endpoints |
| `codex_responses` | OpenAI Codex / Responses API path |
| `anthropic_messages` | Native Anthropic Messages API |
The mode is resolved from explicit args, provider selection, and base URL heuristics.
## Turn lifecycle
```text
run_conversation()
-> generate effective task_id
-> append current user message
-> load or build cached system prompt
-> maybe preflight-compress
-> build api_messages
-> inject ephemeral prompt layers
-> apply prompt caching if appropriate
-> make interruptible API call
-> if tool calls: execute them, append tool results, loop
-> if final text: persist, cleanup, return response
```
## Interruptible API calls
Hermes wraps API requests so they can be interrupted from the CLI or gateway.
This matters because:
- the agent may be in a long LLM call
- the user may send a new message mid-flight
- background systems may need cancellation semantics
## Tool execution modes
Hermes uses two execution strategies:
- sequential execution for single or interactive tools
- concurrent execution for multiple non-interactive tools
Concurrent tool execution preserves message/result ordering when reinserting tool responses into conversation history.
## Callback surfaces
`AIAgent` supports platform/integration callbacks such as:
- `tool_progress_callback`
- `thinking_callback`
- `reasoning_callback`
- `clarify_callback`
- `step_callback`
- `stream_delta_callback`
- `tool_gen_callback`
- `status_callback`
These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows.
## Budget and fallback behavior
Hermes tracks a shared iteration budget across parent and subagents. It also injects budget pressure hints near the end of the available iteration window.
Fallback model support allows the agent to switch providers/models when the primary route fails in supported failure paths.
## Compression and persistence
Before and during long runs, Hermes may:
- flush memory before context loss
- compress middle conversation turns
- split the session lineage into a new session ID after compression
- preserve recent context and structural tool-call/result consistency
## Key files to read next
- `run_agent.py`
- `agent/prompt_builder.py`
- `agent/context_compressor.py`
- `agent/prompt_caching.py`
- `model_tools.py`
## Related docs
- [Provider Runtime Resolution](./provider-runtime.md)
- [Prompt Assembly](./prompt-assembly.md)
- [Context Compression & Prompt Caching](./context-compression-and-caching.md)
- [Tools Runtime](./tools-runtime.md)