trace-field-notes / docs /article.md
JacobLinCool's picture
docs: keep field notes badge quote intact
9936b74 verified
|
Raw
History Blame Contribute Delete
6.71 kB

A newer version of the Gradio SDK is available: 6.19.0

Upgrade

Trace Field Notes: what we built and what we learned

Demo video: https://youtu.be/1QNZlqkl8zo Demo Space: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes GitHub: https://github.com/JacobLinCool/trace-field-notes

Build Small describes the Field Notes badge as having "wrote a blog post or report about what you built and what you learned." This report records the project in that spirit: the problem we studied, the application we built, the design choices that mattered, and the lessons that should carry into future agent-facing systems.

What we built

Trace Field Notes is a small application for reading coding-agent sessions after the work is done. A serious Codex, Claude Code, or Pi Agent run can include planning notes, shell commands, failed tests, patches, retries, progress summaries, caveats, and a final claim of success. The code diff shows the final state of the repository. The trace shows the route by which the agent reached that state.

The application turns that route into a qualitative field report. It parses the visible narrative messages in an agent session, identifies moments of difficulty, groups recurring terrain, and compares the final closeout claim with the evidence the agent reported during the run. The output is meant for builders who want to understand how an agent worked through a task, where it changed course, and what a later run can learn from the session.

The central abstraction is the difficulty episode. A single event is too small to explain a session, while a full trace is too broad to review in one pass. An episode gives the analysis a practical middle scale: an intention, a difficulty, an appraisal, a reroute, an attempted resolution, and an outcome claim. That structure makes long traces easier to compare while preserving more context than a flat command log.

How the application works

The user uploads a Codex, Claude Code, Pi Agent, JSONL, JSON, log, or text file. The app then normalizes the file into visible narrative messages, applies deterministic redaction for likely secrets and private data, and builds an analysis view from a compact codebook of agent-session behavior.

Three analysis paths are available. The deterministic path uses the codebook alone. The quick model path uses openbmb/MiniCPM5-1B. The deeper model path uses nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16. A privacy filter can add a second pass with openai/privacy-filter before analysis. GPU mode uses Hugging Face ZeroGPU, and CPU mode keeps the project usable when quota is limited.

The interface is a custom React field-notebook UI served by gradio.Server. It opens directly on the working surface: upload controls, redaction choices, analysis engine selection, and progress. The report view emphasizes the trail map, episode summaries, terrain groups, detour interpretation, closeout audit, and redacted narrative export. The design goal is a readable review surface for repeated use, with enough structure to scan a long run and enough prose to preserve what made the session distinctive.

What we learned

The first lesson is that coding-agent traces need an analytic scale between telemetry and summary. Raw events preserve evidence, yet they ask the reviewer to reconstruct the task story alone. A single summary is readable, yet it can hide the moment when an agent changed strategy. Difficulty episodes gave us a more stable unit: each episode ties the agent's intention to a visible obstacle, a response, and a claim about the outcome.

The second lesson is that privacy belongs inside the analysis design. Agent logs often contain local paths, prompts, snippets, file names, and operational detail. Trace Field Notes therefore treats redaction as part of the method. The app masks likely sensitive content before model use, lets the user decide whether to include user messages, and frames the final report around visible narrative evidence. This boundary keeps the report useful while respecting the fact that session traces can contain private work.

The third lesson is that small models benefit from a strong scaffold. The model contributes most after the parser, redactor, deterministic analyzer, and schema have already shaped the task. MiniCPM provides a fast reading pass; Nemotron provides a richer long-form interpretation. The surrounding workflow constrains the question so the model can focus on synthesis, comparison, and phrasing.

The fourth lesson is that interface design changes the quality of review. A chat-like transcript makes the user reread the session from the beginning. A field notebook lets the user move by episode, terrain, detour, and closeout claim. For this project, the custom UI was part of the research argument: the right representation helps a builder see agent behavior as a sequence of recoveries, commitments, and evidence claims.

Why it fits Build Small

Trace Field Notes is a Backyard AI project because it addresses a concrete problem for people who already work with coding agents: understanding the session after the agent has finished. The project stays small in scope, small in model size, and specific in audience. It studies one kind of artifact, the visible session trace, and turns that artifact into a form that supports practical review.

The project also fits several Build Small prize categories. The quick analysis path uses MiniCPM5 1B. The deeper path uses Nemotron 3 Nano 30B-A3B. Codex contributed to implementation, debugging, documentation, deployment preparation, and the narrated demo. The custom React interface served through gradio.Server supports the Off-Brand achievement, and this report serves the Field Notes achievement.

Codex's role

Codex acted as a development collaborator throughout the project. It inspected the repository, helped implement backend and frontend changes, debugged runtime behavior, wrote and ran tests, checked privacy handling, prepared hackathon documentation, generated the demo storyboard, recorded app footage, composed the demo video, and validated the final output.

That collaboration is part of the evidence base for the project. Trace Field Notes studies coding-agent sessions, and the project itself was built with a coding agent whose commits and documentation leave an audit trail. The result is both a submission artifact and a practical example of the behavior the app is designed to review.

Scope

Trace Field Notes reports on the visible narrative in a session trace. Hidden reasoning, code correctness, and human review remain outside its evidence base. Its contribution is narrower and useful: it makes the agent's public work story easier to inspect, compare, and learn from.