hermes / website /docs /developer-guide /context-compression-and-caching.md
lenson78's picture
initial upload: v2026.3.23 with HF Spaces deployment
9aa5185 verified
---
sidebar_position: 6
title: "Context Compression & Prompt Caching"
description: "How Hermes compresses long conversations and applies provider-side prompt caching"
---
# Context Compression & Prompt Caching
Hermes manages long conversations with two complementary mechanisms:
- prompt caching
- context compression
Primary files:
- `agent/prompt_caching.py`
- `agent/context_compressor.py`
- `run_agent.py`
## Prompt caching
For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.
Current strategy:
- cache the system prompt
- cache the last 3 non-system messages
- default TTL is 5 minutes unless explicitly extended
This is implemented in `agent/prompt_caching.py`.
## Why prompt stability matters
Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.
## Compression trigger
Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.
## Compression algorithm
The compressor protects:
- the first N turns
- the last N turns
and summarizes the middle section.
It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.
## Pre-compression memory flush
Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.
## Session lineage after compression
Compression can split the session into a new session ID while preserving parent lineage in the state DB.
This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.
## Re-injected state after compression
After compression, Hermes may re-inject compact operational state such as:
- todo snapshot
- prior-read-files summary
## Related docs
- [Prompt Assembly](./prompt-assembly.md)
- [Session Storage](./session-storage.md)
- [Agent Loop Internals](./agent-loop.md)