File size: 2,154 Bytes
9aa5185
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
---
sidebar_position: 6
title: "Context Compression & Prompt Caching"
description: "How Hermes compresses long conversations and applies provider-side prompt caching"
---

# Context Compression & Prompt Caching

Hermes manages long conversations with two complementary mechanisms:

- prompt caching
- context compression

Primary files:

- `agent/prompt_caching.py`
- `agent/context_compressor.py`
- `run_agent.py`

## Prompt caching

For Anthropic/native and Claude-via-OpenRouter flows, Hermes applies Anthropic-style cache markers.

Current strategy:

- cache the system prompt
- cache the last 3 non-system messages
- default TTL is 5 minutes unless explicitly extended

This is implemented in `agent/prompt_caching.py`.

## Why prompt stability matters

Prompt caching only helps when the stable prefix remains stable. That is why Hermes avoids rebuilding or mutating the core system prompt mid-session unless it has to.

## Compression trigger

Hermes can compress context when conversations become large. Configuration defaults live in `config.yaml`, and the compressor also has runtime checks based on actual prompt token counts.

## Compression algorithm

The compressor protects:

- the first N turns
- the last N turns

and summarizes the middle section.

It also cleans up structural issues such as orphaned tool-call/result pairs so the API never receives invalid conversation structure after compression.

## Pre-compression memory flush

Before compression, Hermes can give the model one last chance to persist memory so facts are not lost when middle turns are summarized away.

## Session lineage after compression

Compression can split the session into a new session ID while preserving parent lineage in the state DB.

This lets Hermes continue operating with a smaller active context while retaining a searchable ancestry chain.

## Re-injected state after compression

After compression, Hermes may re-inject compact operational state such as:

- todo snapshot
- prior-read-files summary

## Related docs

- [Prompt Assembly](./prompt-assembly.md)
- [Session Storage](./session-storage.md)
- [Agent Loop Internals](./agent-loop.md)