File size: 5,690 Bytes
fc93158 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | ---
summary: "Reference: provider-specific transcript sanitization and repair rules"
read_when:
- You are debugging provider request rejections tied to transcript shape
- You are changing transcript sanitization or tool-call repair logic
- You are investigating tool-call id mismatches across providers
title: "Transcript Hygiene"
---
# Transcript Hygiene (Provider Fixups)
This document describes **provider-specific fixes** applied to transcripts before a run
(building model context). These are **in-memory** adjustments used to satisfy strict
provider requirements. These hygiene steps do **not** rewrite the stored JSONL transcript
on disk; however, a separate session-file repair pass may rewrite malformed JSONL files
by dropping invalid lines before the session is loaded. When a repair occurs, the original
file is backed up alongside the session file.
Scope includes:
- Tool call id sanitization
- Tool call input validation
- Tool result pairing repair
- Turn validation / ordering
- Thought signature cleanup
- Image payload sanitization
- User-input provenance tagging (for inter-session routed prompts)
If you need transcript storage details, see:
- [/reference/session-management-compaction](/reference/session-management-compaction)
---
## Where this runs
All transcript hygiene is centralized in the embedded runner:
- Policy selection: `src/agents/transcript-policy.ts`
- Sanitization/repair application: `sanitizeSessionHistory` in `src/agents/pi-embedded-runner/google.ts`
The policy uses `provider`, `modelApi`, and `modelId` to decide what to apply.
Separate from transcript hygiene, session files are repaired (if needed) before load:
- `repairSessionFileIfNeeded` in `src/agents/session-file-repair.ts`
- Called from `run/attempt.ts` and `compact.ts` (embedded runner)
---
## Global rule: image sanitization
Image payloads are always sanitized to prevent provider-side rejection due to size
limits (downscale/recompress oversized base64 images).
This also helps control image-driven token pressure for vision-capable models.
Lower max dimensions generally reduce token usage; higher dimensions preserve detail.
Implementation:
- `sanitizeSessionMessagesImages` in `src/agents/pi-embedded-helpers/images.ts`
- `sanitizeContentBlocksImages` in `src/agents/tool-images.ts`
- Max image side is configurable via `agents.defaults.imageMaxDimensionPx` (default: `1200`).
---
## Global rule: malformed tool calls
Assistant tool-call blocks that are missing both `input` and `arguments` are dropped
before model context is built. This prevents provider rejections from partially
persisted tool calls (for example, after a rate limit failure).
Implementation:
- `sanitizeToolCallInputs` in `src/agents/session-transcript-repair.ts`
- Applied in `sanitizeSessionHistory` in `src/agents/pi-embedded-runner/google.ts`
---
## Global rule: inter-session input provenance
When an agent sends a prompt into another session via `sessions_send` (including
agent-to-agent reply/announce steps), OpenClaw persists the created user turn with:
- `message.provenance.kind = "inter_session"`
This metadata is written at transcript append time and does not change role
(`role: "user"` remains for provider compatibility). Transcript readers can use
this to avoid treating routed internal prompts as end-user-authored instructions.
During context rebuild, OpenClaw also prepends a short `[Inter-session message]`
marker to those user turns in-memory so the model can distinguish them from
external end-user instructions.
---
## Provider matrix (current behavior)
**OpenAI / OpenAI Codex**
- Image sanitization only.
- Drop orphaned reasoning signatures (standalone reasoning items without a following content block) for OpenAI Responses/Codex transcripts.
- No tool call id sanitization.
- No tool result pairing repair.
- No turn validation or reordering.
- No synthetic tool results.
- No thought signature stripping.
**Google (Generative AI / Gemini CLI / Antigravity)**
- Tool call id sanitization: strict alphanumeric.
- Tool result pairing repair and synthetic tool results.
- Turn validation (Gemini-style turn alternation).
- Google turn ordering fixup (prepend a tiny user bootstrap if history starts with assistant).
- Antigravity Claude: normalize thinking signatures; drop unsigned thinking blocks.
**Anthropic / Minimax (Anthropic-compatible)**
- Tool result pairing repair and synthetic tool results.
- Turn validation (merge consecutive user turns to satisfy strict alternation).
**Mistral (including model-id based detection)**
- Tool call id sanitization: strict9 (alphanumeric length 9).
**OpenRouter Gemini**
- Thought signature cleanup: strip non-base64 `thought_signature` values (keep base64).
**Everything else**
- Image sanitization only.
---
## Historical behavior (pre-2026.1.22)
Before the 2026.1.22 release, OpenClaw applied multiple layers of transcript hygiene:
- A **transcript-sanitize extension** ran on every context build and could:
- Repair tool use/result pairing.
- Sanitize tool call ids (including a non-strict mode that preserved `_`/`-`).
- The runner also performed provider-specific sanitization, which duplicated work.
- Additional mutations occurred outside the provider policy, including:
- Stripping `<final>` tags from assistant text before persistence.
- Dropping empty assistant error turns.
- Trimming assistant content after tool calls.
This complexity caused cross-provider regressions (notably `openai-responses`
`call_id|fc_id` pairing). The 2026.1.22 cleanup removed the extension, centralized
logic in the runner, and made OpenAI **no-touch** beyond image sanitization.
|