# Context Engineering 🧠 > Keeping long-running agents "forever young" by managing their memory. ## The Problem LLMs have finite context windows. As conversations grow, you eventually hit the token limit and the agent breaks. Simply truncating old messages loses valuable context. ## The Solution: Compactive Summarization Instead of truncating, we **summarize** old conversation history into a compact narrative, preserving the essential context while freeing up tokens. ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Before Compaction (500+ tokens) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ [System] You are an HR assistant... β”‚ β”‚ [Human] Show me all candidates β”‚ β”‚ [AI] Here are 5 candidates: Alice, Bob... β”‚ β”‚ [Human] Tell me about Alice β”‚ β”‚ [AI] Alice is a senior engineer with 5 years... β”‚ β”‚ [Human] Schedule an interview with her β”‚ β”‚ [Tool] Calendar event created... β”‚ β”‚ [AI] Done! Interview scheduled for Monday. β”‚ β”‚ [Human] Now check Bob's CV ← new β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ↓ COMPACTION ↓ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ After Compaction (~200 tokens) β”‚ β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ β”‚ [System] You are an HR assistant... β”‚ β”‚ [AI Summary] User reviewed candidates, focused on β”‚ β”‚ Alice (senior engineer), scheduled interview β”‚ β”‚ for Monday. β”‚ β”‚ [Human] Now check Bob's CV ← kept β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## Architecture ``` β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ CompactingSupervisor β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ 1. Intercept agent execution β”‚ β”‚ β”‚ β”‚ 2. Run agent normally β”‚ β”‚ β”‚ β”‚ 3. Count tokens after response β”‚ β”‚ β”‚ β”‚ 4. If over limit β†’ trigger compaction β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”‚ β”‚ β”‚ β”‚ β–Ό β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ HistoryManager β”‚ β”‚ β”‚ β”‚ β€’ compact_messages() β†’ LLM summarization β”‚ β”‚ β”‚ β”‚ β€’ replace_thread_history() β†’ checkpoint update β”‚ β”‚ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` ## πŸ”’ Subagents and Memory Safety Compaction affects **only the supervisor’s `messages` channel** inside LangGraph’s checkpoint. This includes: - User messages - Supervisor AI messages - **Tool call and Tool result messages** (because these are part of the supervisor’s visible conversation history) This does **not** include: - Sub-agent internal reasoning - Sub-agent private memory - Hidden chain-of-thought - Any messages stored in sub-agent–specific channels Only the messages that the supervisor itself receives are ever compacted. No internal sub-agent state leaks into the compacted summary. ## Key Parameters | Parameter | Default | Description | |-----------|---------|-------------| | `token_limit` | 500 | Trigger compaction when exceeded | | `compaction_ratio` | 0.5 | Fraction of messages to summarize | ### Compaction Ratio Explained The `compaction_ratio` controls how aggressively we summarize: ``` compaction_ratio = 0.5 (Default) β”œβ”€β”€ Summarizes: oldest 50% of messages └── Keeps verbatim: newest 50% of messages compaction_ratio = 0.8 (Aggressive) β”œβ”€β”€ Summarizes: oldest 80% of messages └── Keeps verbatim: only newest 20% β†’ Use when context is very tight compaction_ratio = 0.2 (Gentle) β”œβ”€β”€ Summarizes: only oldest 20% └── Keeps verbatim: newest 80% β†’ Use when you want more history preserved ``` **Example with 10 messages:** - `ratio=0.5` β†’ Summarize messages 1-5, keep 6-10 verbatim - `ratio=0.8` β†’ Summarize messages 1-8, keep 9-10 verbatim - `ratio=0.2` β†’ Summarize messages 1-2, keep 3-10 verbatim ## Usage ```python from src.backend.context_eng import compacting_supervisor # Just use it like a normal agent - compaction is automatic! response = compacting_supervisor.invoke( {"messages": [HumanMessage(content="Hello")]}, config={"configurable": {"thread_id": "my-thread"}} ) # Streaming works too for chunk in compacting_supervisor.stream(...): if chunk["type"] == "token": print(chunk["content"], end="") ``` ## LangGraph Integration ### How It Wraps the Agent The `CompactingSupervisor` uses the **Interceptor Pattern** - it wraps the existing LangGraph agent without modifying it: ```python # In compacting_supervisor.py from src.backend.agents.supervisor.supervisor_v2 import supervisor_agent, memory compacting_supervisor = CompactingSupervisor( agent=supervisor_agent, # ← Original LangGraph agent history_manager=HistoryManager(memory_saver=memory), # ← LangGraph's MemorySaver ... ) ``` The agent itself is **unchanged**. We just intercept `invoke()` and `stream()` calls. ### How It Manipulates LangGraph Memory LangGraph uses **checkpoints** to persist conversation state. Normally, messages are append-only. Our `HistoryManager.replace_thread_history()` bypasses this to force a rewrite: ``` Normal LangGraph flow: β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Checkpoint Storage (MemorySaver) β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ messages: [m1, m2, m3, m4...] β”‚ β”‚ ← Append-only β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ After compaction (we override): β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Checkpoint Storage (MemorySaver) β”‚ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ β”‚ β”‚ messages: [sys, summary, m4] β”‚ β”‚ ← Force-replaced! β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ ``` **Key mechanism in `replace_thread_history()`:** 1. Get current checkpoint via `memory.get_tuple(config)` 2. Build new checkpoint with compacted messages 3. Increment version + update timestamps 4. Write directly via `memory.put(...)` - bypassing normal reducers This is a **low-level override** of LangGraph's internal checkpoint format. It works because we maintain the expected checkpoint structure (`channel_versions`, `channel_values`, etc.). ## Files | File | Purpose | |------|---------| | `token_counter.py` | Count tokens in message lists | | `history_manager.py` | Summarization + checkpoint manipulation | | `compacting_supervisor.py` | Agent wrapper (Interceptor Pattern) |