Spaces:

mukunda1729
/

trace-format-reference

Running

App Files Files Community

trace-format-reference / index.html

mukunda1729

Initial: agentsnap trace JSON schema reference

5c49f9c verified 15 days ago

raw

history blame contribute delete

4.65 kB

	<!doctype html>
	<html lang="en">
	<head>
	<meta charset="utf-8">
	<meta name="viewport" content="width=device-width, initial-scale=1">
	<title>Agent Trace Format Reference</title>
	<style>
	:root { --bg: #1a1a1a; --fg: #e8e6e1; --muted: #9a9690; --accent: #d4a853; --card: #232323; --border: #353535; }
	* { box-sizing: border-box; }
	body { font: 16px/1.55 -apple-system, BlinkMacSystemFont, 'SF Pro Text', sans-serif; background: var(--bg); color: var(--fg); margin: 0; padding: 2rem 1.25rem; }
	main { max-width: 920px; margin: 0 auto; }
	h1 { color: var(--accent); font-size: 2.1rem; margin: 0 0 0.5rem; }
	h2 { color: var(--accent); font-size: 1.3rem; margin: 2rem 0 0.6rem; }
	.lede { color: var(--muted); font-size: 1.05rem; }
	pre { background: var(--card); border: 1px solid var(--border); padding: 1rem 1.1rem; border-radius: 8px; overflow-x: auto; font-family: ui-monospace, SF Mono, monospace; font-size: 0.85rem; line-height: 1.5; }
	table { width: 100%; border-collapse: collapse; margin: 1rem 0; }
	th, td { text-align: left; padding: 0.6rem 0.8rem; border-bottom: 1px solid var(--border); font-size: 0.92rem; vertical-align: top; }
	th { color: var(--accent); }
	td code { color: var(--accent); font-family: ui-monospace, SF Mono, monospace; }
	td.field { white-space: nowrap; }
	a { color: var(--accent); }
	footer { color: var(--muted); margin-top: 3rem; padding-top: 1.5rem; border-top: 1px solid var(--border); font-size: 0.9rem; }
	</style>
	</head>
	<body>
	<main>
	<h1>📸 Agent Trace Format</h1>
	<p class="lede">A normalized JSON shape for capturing one agent run — input, tool calls, output, and a fingerprint. Used by <code>agentsnap</code> to diff runs and detect silent regressions.</p>

	<h2>Full example</h2>
	<pre>{
	"version": 1,
	"model": "claude-sonnet-4-6",
	"input": "search for python tutorials",
	"output": "Here are 3 results.",
	"tools": [
	{ "name": "web_search", "args": { "q": "python tutorials" }, "result_hash": "abc123" },
	{ "name": "fetch_page", "args": { "url": "https://example.com" }, "result_hash": "def456" }
	],
	"error": null,
	"fingerprint": { "node": "20.0", "agentsnap": "0.1.0" }
	}</pre>

	<h2>Fields</h2>
	<table>
	<tr><th>Field</th><th>Type</th><th>Notes</th></tr>
	<tr><td class="field"><code>version</code></td><td>int</td><td>Schema version. Currently <code>1</code>.</td></tr>
	<tr><td class="field"><code>model</code></td><td>string</td><td>Model identifier. Used to skip diffs across model upgrades.</td></tr>
	<tr><td class="field"><code>input</code></td><td>string</td><td>The user prompt that started the run.</td></tr>
	<tr><td class="field"><code>output</code></td><td>string</td><td>Final agent response.</td></tr>
	<tr><td class="field"><code>tools</code></td><td>array</td><td>Ordered list of tool calls. Each entry is <code>{name, args, result_hash}</code>.</td></tr>
	<tr><td class="field"><code>tools[].name</code></td><td>string</td><td>Tool identifier (dotted path like <code>filesystem.read_file</code>).</td></tr>
	<tr><td class="field"><code>tools[].args</code></td><td>object</td><td>Args passed to the tool. Recorded literally.</td></tr>
	<tr><td class="field"><code>tools[].result_hash</code></td><td>string</td><td>Hash of the tool's return value. Avoid storing PII / large payloads in the trace.</td></tr>
	<tr><td class="field"><code>error</code></td><td>string \\| null</td><td>Run-level error message, if the run failed.</td></tr>
	<tr><td class="field"><code>fingerprint</code></td><td>object</td><td>Environment metadata. <code>node</code> + <code>agentsnap</code> version are recommended; add your own keys.</td></tr>
	</table>

	<h2>Why hash the tool result?</h2>
	<p>Tool results are often large (files, API payloads, search results). Hashing keeps the trace small and avoids leaking PII into your snapshot store. The hash is enough to detect "the result changed" — for "<em>how</em> did it change?", re-run with full payloads enabled.</p>

	<h2>Diffing two traces</h2>
	<pre>from agentsnap import diff

	result = diff(baseline_trace, current_trace)
	print(result.status) # "match" \| "drift" \| "regression"
	for change in result.changes:
	print(change.path, change.from_, "→", change.to)</pre>

	<h2>Sample traces</h2>
	<p>The <a href="https://huggingface.co/datasets/mukunda1729/agent-trace-samples">agent-trace-samples</a> dataset has 10 example traces (good + regressed pairs) you can drop into your tests.</p>

	<footer>
	Part of <a href="https://mukundakatta.github.io/agent-stack/">The Agent Reliability Stack</a> · MIT licensed
	</footer>
	</main>
	</body>
	</html>