Spaces:
Running on Zero
Running on Zero
| # Trace Field Notes: what we built and what we learned | |
| Demo video: https://youtu.be/1QNZlqkl8zo | |
| Demo Space: https://huggingface.co/spaces/build-small-hackathon/trace-field-notes | |
| GitHub: https://github.com/JacobLinCool/trace-field-notes | |
| Build Small describes the Field Notes badge as having "wrote a blog post or report about what you built and what you learned." This report records the project in that spirit: the problem we studied, the application we built, the design choices that mattered, and the lessons that should carry into future agent-facing systems. | |
| ## What we built | |
| Trace Field Notes is a small application for reading coding-agent sessions after | |
| the work is done. A serious Codex, Claude Code, or Pi Agent run can include | |
| planning notes, shell commands, failed tests, patches, retries, progress | |
| summaries, caveats, and a final claim of success. The code diff shows the final | |
| state of the repository. The trace shows the route by which the agent reached | |
| that state. | |
| The application turns that route into a qualitative field report. It parses the | |
| visible narrative messages in an agent session, identifies moments of difficulty, | |
| groups recurring terrain, and compares the final closeout claim with the | |
| evidence the agent reported during the run. The output is meant for builders who | |
| want to understand how an agent worked through a task, where it changed course, | |
| and what a later run can learn from the session. | |
| The central abstraction is the difficulty episode. A single event is too small to | |
| explain a session, while a full trace is too broad to review in one pass. An | |
| episode gives the analysis a practical middle scale: an intention, a difficulty, | |
| an appraisal, a reroute, an attempted resolution, and an outcome claim. That | |
| structure makes long traces easier to compare while preserving more context than | |
| a flat command log. | |
| ## How the application works | |
| The user uploads a Codex, Claude Code, Pi Agent, JSONL, JSON, log, or text file. | |
| The app then normalizes the file into visible narrative messages, applies | |
| deterministic redaction for likely secrets and private data, and builds an | |
| analysis view from a compact codebook of agent-session behavior. | |
| Three analysis paths are available. The deterministic path uses the codebook | |
| alone. The quick model path uses `openbmb/MiniCPM5-1B`. The deeper model path | |
| uses `nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16`. A privacy filter can add a | |
| second pass with `openai/privacy-filter` before analysis. GPU mode uses Hugging | |
| Face ZeroGPU, and CPU mode keeps the project usable when quota is limited. | |
| The interface is a custom React field-notebook UI served by `gradio.Server`. It | |
| opens directly on the working surface: upload controls, redaction choices, | |
| analysis engine selection, and progress. The report view emphasizes the trail | |
| map, episode summaries, terrain groups, detour interpretation, closeout audit, | |
| and redacted narrative export. The design goal is a readable review surface for | |
| repeated use, with enough structure to scan a long run and enough prose to | |
| preserve what made the session distinctive. | |
| ## What we learned | |
| The first lesson is that coding-agent traces need an analytic scale between | |
| telemetry and summary. Raw events preserve evidence, yet they ask the reviewer to | |
| reconstruct the task story alone. A single summary is readable, yet it can hide | |
| the moment when an agent changed strategy. Difficulty episodes gave us a more | |
| stable unit: each episode ties the agent's intention to a visible obstacle, a | |
| response, and a claim about the outcome. | |
| The second lesson is that privacy belongs inside the analysis design. Agent logs | |
| often contain local paths, prompts, snippets, file names, and operational detail. | |
| Trace Field Notes therefore treats redaction as part of the method. The app | |
| masks likely sensitive content before model use, lets the user decide whether to | |
| include user messages, and frames the final report around visible narrative | |
| evidence. This boundary keeps the report useful while respecting the fact that | |
| session traces can contain private work. | |
| The third lesson is that small models benefit from a strong scaffold. The model | |
| contributes most after the parser, redactor, deterministic analyzer, and schema | |
| have already shaped the task. MiniCPM provides a fast reading pass; Nemotron | |
| provides a richer long-form interpretation. The surrounding workflow constrains | |
| the question so the model can focus on synthesis, comparison, and phrasing. | |
| The fourth lesson is that interface design changes the quality of review. A | |
| chat-like transcript makes the user reread the session from the beginning. A | |
| field notebook lets the user move by episode, terrain, detour, and closeout | |
| claim. For this project, the custom UI was part of the research argument: the | |
| right representation helps a builder see agent behavior as a sequence of | |
| recoveries, commitments, and evidence claims. | |
| ## Why it fits Build Small | |
| Trace Field Notes is a Backyard AI project because it addresses a concrete | |
| problem for people who already work with coding agents: understanding the | |
| session after the agent has finished. The project stays small in scope, small in | |
| model size, and specific in audience. It studies one kind of artifact, the | |
| visible session trace, and turns that artifact into a form that supports | |
| practical review. | |
| The project also fits several Build Small prize categories. The quick analysis | |
| path uses MiniCPM5 1B. The deeper path uses Nemotron 3 Nano 30B-A3B. Codex | |
| contributed to implementation, debugging, documentation, deployment preparation, | |
| and the narrated demo. The custom React interface served through `gradio.Server` | |
| supports the Off-Brand achievement, and this report serves the Field Notes | |
| achievement. | |
| ## Codex's role | |
| Codex acted as a development collaborator throughout the project. It inspected | |
| the repository, helped implement backend and frontend changes, debugged runtime | |
| behavior, wrote and ran tests, checked privacy handling, prepared hackathon | |
| documentation, generated the demo storyboard, recorded app footage, composed the | |
| demo video, and validated the final output. | |
| That collaboration is part of the evidence base for the project. Trace Field | |
| Notes studies coding-agent sessions, and the project itself was built with a | |
| coding agent whose commits and documentation leave an audit trail. The result is | |
| both a submission artifact and a practical example of the behavior the app is | |
| designed to review. | |
| ## Scope | |
| Trace Field Notes reports on the visible narrative in a session trace. Hidden | |
| reasoning, code correctness, and human review remain outside its evidence base. | |
| Its contribution is narrower and useful: it makes the agent's public work story | |
| easier to inspect, compare, and learn from. | |