--- sidebar_position: 3 title: "Agent Loop Internals" description: "Detailed walkthrough of AIAgent execution, API modes, tools, callbacks, and fallback behavior" --- # Agent Loop Internals The core orchestration engine is `run_agent.py`'s `AIAgent`. ## Core responsibilities `AIAgent` is responsible for: - assembling the effective prompt and tool schemas - selecting the correct provider/API mode - making interruptible model calls - executing tool calls (sequentially or concurrently) - maintaining session history - handling compression, retries, and fallback models ## API modes Hermes currently supports three API execution modes: | API mode | Used for | |----------|----------| | `chat_completions` | OpenAI-compatible chat endpoints, including OpenRouter and most custom endpoints | | `codex_responses` | OpenAI Codex / Responses API path | | `anthropic_messages` | Native Anthropic Messages API | The mode is resolved from explicit args, provider selection, and base URL heuristics. ## Turn lifecycle ```text run_conversation() -> generate effective task_id -> append current user message -> load or build cached system prompt -> maybe preflight-compress -> build api_messages -> inject ephemeral prompt layers -> apply prompt caching if appropriate -> make interruptible API call -> if tool calls: execute them, append tool results, loop -> if final text: persist, cleanup, return response ``` ## Interruptible API calls Hermes wraps API requests so they can be interrupted from the CLI or gateway. This matters because: - the agent may be in a long LLM call - the user may send a new message mid-flight - background systems may need cancellation semantics ## Tool execution modes Hermes uses two execution strategies: - sequential execution for single or interactive tools - concurrent execution for multiple non-interactive tools Concurrent tool execution preserves message/result ordering when reinserting tool responses into conversation history. ## Callback surfaces `AIAgent` supports platform/integration callbacks such as: - `tool_progress_callback` - `thinking_callback` - `reasoning_callback` - `clarify_callback` - `step_callback` - `stream_delta_callback` - `tool_gen_callback` - `status_callback` These are how the CLI, gateway, and ACP integrations stream intermediate progress and interactive approval/clarification flows. ## Budget and fallback behavior Hermes tracks a shared iteration budget across parent and subagents. It also injects budget pressure hints near the end of the available iteration window. Fallback model support allows the agent to switch providers/models when the primary route fails in supported failure paths. ## Compression and persistence Before and during long runs, Hermes may: - flush memory before context loss - compress middle conversation turns - split the session lineage into a new session ID after compression - preserve recent context and structural tool-call/result consistency ## Key files to read next - `run_agent.py` - `agent/prompt_builder.py` - `agent/context_compressor.py` - `agent/prompt_caching.py` - `model_tools.py` ## Related docs - [Provider Runtime Resolution](./provider-runtime.md) - [Prompt Assembly](./prompt-assembly.md) - [Context Compression & Prompt Caching](./context-compression-and-caching.md) - [Tools Runtime](./tools-runtime.md)