Spaces:

build-small-hackathon
/

trace-field-notes

Running on Zero

App Files Files Community

trace-field-notes / docs /article.md

JacobLinCool

docs: keep field notes badge quote intact

9936b74 verified 12 days ago

preview code

Raw

History Blame Contribute Delete

6.71 kB

	# Trace Field Notes: what we built and what we learned

	Demo video: https://youtu.be/1QNZlqkl8zo
	Demo Space: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes
	GitHub: https://github.com/JacobLinCool/trace-field-notes

	Build Small describes the Field Notes badge as having "wrote a blog post or report about what you built and what you learned." This report records the project in that spirit: the problem we studied, the application we built, the design choices that mattered, and the lessons that should carry into future agent-facing systems.

	## What we built

	Trace Field Notes is a small application for reading coding-agent sessions after
	the work is done. A serious Codex, Claude Code, or Pi Agent run can include
	planning notes, shell commands, failed tests, patches, retries, progress
	summaries, caveats, and a final claim of success. The code diff shows the final
	state of the repository. The trace shows the route by which the agent reached
	that state.

	The application turns that route into a qualitative field report. It parses the
	visible narrative messages in an agent session, identifies moments of difficulty,
	groups recurring terrain, and compares the final closeout claim with the
	evidence the agent reported during the run. The output is meant for builders who
	want to understand how an agent worked through a task, where it changed course,
	and what a later run can learn from the session.

	The central abstraction is the difficulty episode. A single event is too small to
	explain a session, while a full trace is too broad to review in one pass. An
	episode gives the analysis a practical middle scale: an intention, a difficulty,
	an appraisal, a reroute, an attempted resolution, and an outcome claim. That
	structure makes long traces easier to compare while preserving more context than
	a flat command log.

	## How the application works

	The user uploads a Codex, Claude Code, Pi Agent, JSONL, JSON, log, or text file.
	The app then normalizes the file into visible narrative messages, applies
	deterministic redaction for likely secrets and private data, and builds an
	analysis view from a compact codebook of agent-session behavior.

	Three analysis paths are available. The deterministic path uses the codebook
	alone. The quick model path uses `openbmb/MiniCPM5-1B`. The deeper model path
	uses `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16`. A privacy filter can add a
	second pass with `openai/privacy-filter` before analysis. GPU mode uses Hugging
	Face ZeroGPU, and CPU mode keeps the project usable when quota is limited.

	The interface is a custom React field-notebook UI served by `gradio.Server`. It
	opens directly on the working surface: upload controls, redaction choices,
	analysis engine selection, and progress. The report view emphasizes the trail
	map, episode summaries, terrain groups, detour interpretation, closeout audit,
	and redacted narrative export. The design goal is a readable review surface for
	repeated use, with enough structure to scan a long run and enough prose to
	preserve what made the session distinctive.

	## What we learned

	The first lesson is that coding-agent traces need an analytic scale between
	telemetry and summary. Raw events preserve evidence, yet they ask the reviewer to
	reconstruct the task story alone. A single summary is readable, yet it can hide
	the moment when an agent changed strategy. Difficulty episodes gave us a more
	stable unit: each episode ties the agent's intention to a visible obstacle, a
	response, and a claim about the outcome.

	The second lesson is that privacy belongs inside the analysis design. Agent logs
	often contain local paths, prompts, snippets, file names, and operational detail.
	Trace Field Notes therefore treats redaction as part of the method. The app
	masks likely sensitive content before model use, lets the user decide whether to
	include user messages, and frames the final report around visible narrative
	evidence. This boundary keeps the report useful while respecting the fact that
	session traces can contain private work.

	The third lesson is that small models benefit from a strong scaffold. The model
	contributes most after the parser, redactor, deterministic analyzer, and schema
	have already shaped the task. MiniCPM provides a fast reading pass; Nemotron
	provides a richer long-form interpretation. The surrounding workflow constrains
	the question so the model can focus on synthesis, comparison, and phrasing.

	The fourth lesson is that interface design changes the quality of review. A
	chat-like transcript makes the user reread the session from the beginning. A
	field notebook lets the user move by episode, terrain, detour, and closeout
	claim. For this project, the custom UI was part of the research argument: the
	right representation helps a builder see agent behavior as a sequence of
	recoveries, commitments, and evidence claims.

	## Why it fits Build Small

	Trace Field Notes is a Backyard AI project because it addresses a concrete
	problem for people who already work with coding agents: understanding the
	session after the agent has finished. The project stays small in scope, small in
	model size, and specific in audience. It studies one kind of artifact, the
	visible session trace, and turns that artifact into a form that supports
	practical review.

	The project also fits several Build Small prize categories. The quick analysis
	path uses MiniCPM5 1B. The deeper path uses Nemotron 3 Nano 30B-A3B. Codex
	contributed to implementation, debugging, documentation, deployment preparation,
	and the narrated demo. The custom React interface served through `gradio.Server`
	supports the Off-Brand achievement, and this report serves the Field Notes
	achievement.

	## Codex's role

	Codex acted as a development collaborator throughout the project. It inspected
	the repository, helped implement backend and frontend changes, debugged runtime
	behavior, wrote and ran tests, checked privacy handling, prepared hackathon
	documentation, generated the demo storyboard, recorded app footage, composed the
	demo video, and validated the final output.

	That collaboration is part of the evidence base for the project. Trace Field
	Notes studies coding-agent sessions, and the project itself was built with a
	coding agent whose commits and documentation leave an audit trail. The result is
	both a submission artifact and a practical example of the behavior the app is
	designed to review.

	## Scope

	Trace Field Notes reports on the visible narrative in a session trace. Hidden
	reasoning, code correctness, and human review remain outside its evidence base.
	Its contribution is narrower and useful: it makes the agent's public work story
	easier to inspect, compare, and learn from.