trace-field-notes / docs /article.md
JacobLinCool's picture
docs: keep field notes badge quote intact
9936b74 verified
|
Raw
History Blame Contribute Delete
6.71 kB
# Trace Field Notes: what we built and what we learned
Demo video: https://youtu.be/1QNZlqkl8zo
Demo Space: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes
GitHub: https://github.com/JacobLinCool/trace-field-notes
Build Small describes the Field Notes badge as having "wrote a blog post or report about what you built and what you learned." This report records the project in that spirit: the problem we studied, the application we built, the design choices that mattered, and the lessons that should carry into future agent-facing systems.
## What we built
Trace Field Notes is a small application for reading coding-agent sessions after
the work is done. A serious Codex, Claude Code, or Pi Agent run can include
planning notes, shell commands, failed tests, patches, retries, progress
summaries, caveats, and a final claim of success. The code diff shows the final
state of the repository. The trace shows the route by which the agent reached
that state.
The application turns that route into a qualitative field report. It parses the
visible narrative messages in an agent session, identifies moments of difficulty,
groups recurring terrain, and compares the final closeout claim with the
evidence the agent reported during the run. The output is meant for builders who
want to understand how an agent worked through a task, where it changed course,
and what a later run can learn from the session.
The central abstraction is the difficulty episode. A single event is too small to
explain a session, while a full trace is too broad to review in one pass. An
episode gives the analysis a practical middle scale: an intention, a difficulty,
an appraisal, a reroute, an attempted resolution, and an outcome claim. That
structure makes long traces easier to compare while preserving more context than
a flat command log.
## How the application works
The user uploads a Codex, Claude Code, Pi Agent, JSONL, JSON, log, or text file.
The app then normalizes the file into visible narrative messages, applies
deterministic redaction for likely secrets and private data, and builds an
analysis view from a compact codebook of agent-session behavior.
Three analysis paths are available. The deterministic path uses the codebook
alone. The quick model path uses `openbmb/MiniCPM5-1B`. The deeper model path
uses `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16`. A privacy filter can add a
second pass with `openai/privacy-filter` before analysis. GPU mode uses Hugging
Face ZeroGPU, and CPU mode keeps the project usable when quota is limited.
The interface is a custom React field-notebook UI served by `gradio.Server`. It
opens directly on the working surface: upload controls, redaction choices,
analysis engine selection, and progress. The report view emphasizes the trail
map, episode summaries, terrain groups, detour interpretation, closeout audit,
and redacted narrative export. The design goal is a readable review surface for
repeated use, with enough structure to scan a long run and enough prose to
preserve what made the session distinctive.
## What we learned
The first lesson is that coding-agent traces need an analytic scale between
telemetry and summary. Raw events preserve evidence, yet they ask the reviewer to
reconstruct the task story alone. A single summary is readable, yet it can hide
the moment when an agent changed strategy. Difficulty episodes gave us a more
stable unit: each episode ties the agent's intention to a visible obstacle, a
response, and a claim about the outcome.
The second lesson is that privacy belongs inside the analysis design. Agent logs
often contain local paths, prompts, snippets, file names, and operational detail.
Trace Field Notes therefore treats redaction as part of the method. The app
masks likely sensitive content before model use, lets the user decide whether to
include user messages, and frames the final report around visible narrative
evidence. This boundary keeps the report useful while respecting the fact that
session traces can contain private work.
The third lesson is that small models benefit from a strong scaffold. The model
contributes most after the parser, redactor, deterministic analyzer, and schema
have already shaped the task. MiniCPM provides a fast reading pass; Nemotron
provides a richer long-form interpretation. The surrounding workflow constrains
the question so the model can focus on synthesis, comparison, and phrasing.
The fourth lesson is that interface design changes the quality of review. A
chat-like transcript makes the user reread the session from the beginning. A
field notebook lets the user move by episode, terrain, detour, and closeout
claim. For this project, the custom UI was part of the research argument: the
right representation helps a builder see agent behavior as a sequence of
recoveries, commitments, and evidence claims.
## Why it fits Build Small
Trace Field Notes is a Backyard AI project because it addresses a concrete
problem for people who already work with coding agents: understanding the
session after the agent has finished. The project stays small in scope, small in
model size, and specific in audience. It studies one kind of artifact, the
visible session trace, and turns that artifact into a form that supports
practical review.
The project also fits several Build Small prize categories. The quick analysis
path uses MiniCPM5 1B. The deeper path uses Nemotron 3 Nano 30B-A3B. Codex
contributed to implementation, debugging, documentation, deployment preparation,
and the narrated demo. The custom React interface served through `gradio.Server`
supports the Off-Brand achievement, and this report serves the Field Notes
achievement.
## Codex's role
Codex acted as a development collaborator throughout the project. It inspected
the repository, helped implement backend and frontend changes, debugged runtime
behavior, wrote and ran tests, checked privacy handling, prepared hackathon
documentation, generated the demo storyboard, recorded app footage, composed the
demo video, and validated the final output.
That collaboration is part of the evidence base for the project. Trace Field
Notes studies coding-agent sessions, and the project itself was built with a
coding agent whose commits and documentation leave an audit trail. The result is
both a submission artifact and a practical example of the behavior the app is
designed to review.
## Scope
Trace Field Notes reports on the visible narrative in a session trace. Hidden
reasoning, code correctness, and human review remain outside its evidence base.
Its contribution is narrower and useful: it makes the agent's public work story
easier to inspect, compare, and learn from.