Context Engineering π§
Keeping long-running agents "forever young" by managing their memory.
The Problem
LLMs have finite context windows. As conversations grow, you eventually hit the token limit and the agent breaks. Simply truncating old messages loses valuable context.
The Solution: Compactive Summarization
Instead of truncating, we summarize old conversation history into a compact narrative, preserving the essential context while freeing up tokens.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Before Compaction (500+ tokens) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [System] You are an HR assistant... β
β [Human] Show me all candidates β
β [AI] Here are 5 candidates: Alice, Bob... β
β [Human] Tell me about Alice β
β [AI] Alice is a senior engineer with 5 years... β
β [Human] Schedule an interview with her β
β [Tool] Calendar event created... β
β [AI] Done! Interview scheduled for Monday. β
β [Human] Now check Bob's CV β new β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β COMPACTION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β After Compaction (~200 tokens) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [System] You are an HR assistant... β
β [AI Summary] User reviewed candidates, focused on β
β Alice (senior engineer), scheduled interview β
β for Monday. β
β [Human] Now check Bob's CV β kept β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Architecture
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CompactingSupervisor β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β 1. Intercept agent execution β β
β β 2. Run agent normally β β
β β 3. Count tokens after response β β
β β 4. If over limit β trigger compaction β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β β
β βΌ β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β HistoryManager β β
β β β’ compact_messages() β LLM summarization β β
β β β’ replace_thread_history() β checkpoint update β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
π Subagents and Memory Safety
Compaction affects only the supervisorβs messages channel inside LangGraphβs checkpoint.
This includes:
- User messages
- Supervisor AI messages
- Tool call and Tool result messages (because these are part of the supervisorβs visible conversation history)
This does not include:
- Sub-agent internal reasoning
- Sub-agent private memory
- Hidden chain-of-thought
- Any messages stored in sub-agentβspecific channels
Only the messages that the supervisor itself receives are ever compacted.
No internal sub-agent state leaks into the compacted summary.
Key Parameters
| Parameter | Default | Description |
|---|---|---|
token_limit |
500 | Trigger compaction when exceeded |
compaction_ratio |
0.5 | Fraction of messages to summarize |
Compaction Ratio Explained
The compaction_ratio controls how aggressively we summarize:
compaction_ratio = 0.5 (Default)
βββ Summarizes: oldest 50% of messages
βββ Keeps verbatim: newest 50% of messages
compaction_ratio = 0.8 (Aggressive)
βββ Summarizes: oldest 80% of messages
βββ Keeps verbatim: only newest 20%
β Use when context is very tight
compaction_ratio = 0.2 (Gentle)
βββ Summarizes: only oldest 20%
βββ Keeps verbatim: newest 80%
β Use when you want more history preserved
Example with 10 messages:
ratio=0.5β Summarize messages 1-5, keep 6-10 verbatimratio=0.8β Summarize messages 1-8, keep 9-10 verbatimratio=0.2β Summarize messages 1-2, keep 3-10 verbatim
Usage
from src.context_eng import compacting_supervisor
# Just use it like a normal agent - compaction is automatic!
response = compacting_supervisor.invoke(
{"messages": [HumanMessage(content="Hello")]},
config={"configurable": {"thread_id": "my-thread"}}
)
# Streaming works too
for chunk in compacting_supervisor.stream(...):
if chunk["type"] == "token":
print(chunk["content"], end="")
LangGraph Integration
How It Wraps the Agent
The CompactingSupervisor uses the Interceptor Pattern - it wraps the existing LangGraph agent without modifying it:
# In compacting_supervisor.py
from src.agents.supervisor.supervisor_v2 import supervisor_agent, memory
compacting_supervisor = CompactingSupervisor(
agent=supervisor_agent, # β Original LangGraph agent
history_manager=HistoryManager(memory_saver=memory), # β LangGraph's MemorySaver
...
)
The agent itself is unchanged. We just intercept invoke() and stream() calls.
How It Manipulates LangGraph Memory
LangGraph uses checkpoints to persist conversation state. Normally, messages are append-only. Our HistoryManager.replace_thread_history() bypasses this to force a rewrite:
Normal LangGraph flow:
βββββββββββββββββββββββββββββββββββββββ
β Checkpoint Storage (MemorySaver) β
β βββββββββββββββββββββββββββββββββ β
β β messages: [m1, m2, m3, m4...] β β β Append-only
β βββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
After compaction (we override):
βββββββββββββββββββββββββββββββββββββββ
β Checkpoint Storage (MemorySaver) β
β βββββββββββββββββββββββββββββββββ β
β β messages: [sys, summary, m4] β β β Force-replaced!
β βββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββ
Key mechanism in replace_thread_history():
- Get current checkpoint via
memory.get_tuple(config) - Build new checkpoint with compacted messages
- Increment version + update timestamps
- Write directly via
memory.put(...)- bypassing normal reducers
This is a low-level override of LangGraph's internal checkpoint format. It works because we maintain the expected checkpoint structure (channel_versions, channel_values, etc.).
Files
| File | Purpose |
|---|---|
token_counter.py |
Count tokens in message lists |
history_manager.py |
Summarization + checkpoint manipulation |
compacting_supervisor.py |
Agent wrapper (Interceptor Pattern) |