--- sidebar_position: 6 title: "Context Compression & Prompt Caching" description: "How Hermes compresses long conversations and applies provider-side prompt caching" --- # Context Compression & Prompt Caching Hermes manages long conversations with two complementary mechanisms: - prompt caching - context compression Primary files: - `agent/prompt_caching.py` - `agent/context_compressor.py` - `run_agent.py` ## Prompt caching For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers. Current strategy: - cache the system prompt - cache the last 3 non-system messages - default TTL is 5 minutes unless explicitly extended This is implemented in `agent/prompt_caching.py`. ## Why prompt stability matters Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to. ## Compression trigger Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts. ## Compression algorithm The compressor protects: - the first N turns - the last N turns and summarizes the middle section. It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression. ## Pre-compression memory flush Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away. ## Session lineage after compression Compression can split the session into a new session ID while preserving parent lineage in the state DB. This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain. ## Re-injected state after compression After compression, Hermes may re-inject compact operational state such as: - todo snapshot - prior-read-files summary ## Related docs - [Prompt Assembly](./prompt-assembly.md) - [Session Storage](./session-storage.md) - [Agent Loop Internals](./agent-loop.md)