Buckets:
| # The Agentic Loop Deep Dive | |
| The core of nano_harness is a loop that runs at most fifty times: call the LLM, parse its code, execute it, observe results, repeat. | |
|  | |
| ## The Loop Visually | |
| ``` | |
| ┌─────────────────────────────────────────────┐ | |
| │ 1. Initialize: task + system prompt │ | |
| │ message_history = [system_msg] │ | |
| │ step = 0 │ | |
| └──────────────┬──────────────────────────────┘ | |
| │ | |
| ▼ | |
| ┌──────────────────────┐ | |
| │ 2. Call LLM │ | |
| │ with message_hist │ | |
| │ (max_tokens: 4096)│ | |
| └──────┬───────────────┘ | |
| │ | |
| ▼ | |
| ┌──────────────────────┐ | |
| │ 3. Parse output │ | |
| │ Extract Python │ | |
| │ code blocks │ | |
| └──────┬───────────────┘ | |
| │ | |
| ▼ | |
| ┌──────────────────────┐ | |
| │ 4. Execute Python │ | |
| │ Capture stdout/ │ | |
| │ stderr │ | |
| └──────┬───────────────┘ | |
| │ | |
| ▼ | |
| ┌──────────────────────┐ | |
| │ 5. Check: │ | |
| │ final_answer() │ | |
| │ called? │ | |
| └──────┬───────────────┘ | |
| │ │ | |
| No Yes → Done! | |
| │ | |
| ▼ | |
| ┌──────────────────────┐ | |
| │ 6. Append result │ | |
| │ to message_hist │ | |
| │ step += 1 │ | |
| └──────┬───────────────┘ | |
| │ | |
| ▼ | |
| ┌──────────────────────┐ | |
| │ 7. step [!NOTE] | |
| > **This is a simplification.** The nano harness treats memory as a flat conversation history — every prior turn stays in context until the window fills up. Production agent systems use much richer memory architectures: short-term scratchpads for working state, episodic memory for recalling past sessions, semantic memory for persistent knowledge, retrieval-augmented approaches that fetch relevant memories on demand, and compaction strategies that summarize older context to free up space. If you want to go deeper, look into the research on agent memory systems (e.g., MemoryAgentBench, A-MEM) and context compression (e.g., ACON). The nano harness is a teaching tool — it shows the minimal viable loop, not the full picture. | |
| ## Context Management | |
| With MAX_TOKENS=4096 and MAX_CHARS=8000: | |
| ```python | |
| # Good: Agent reads one file at a time | |
| read_file("test.py", max_chars=2000) # 2000 chars ✓ | |
| # Bad: Agent tries to read entire codebase at once | |
| read_file("large_codebase.py", max_chars=50000) # Gets clipped to 8000 | |
| ``` | |
| The agent learns to read strategically to stay within limits. | |
| > [!NOTE] | |
| > **Context management in production is a major topic.** Real code agents implement compaction (summarizing earlier context), structured note-taking (maintaining scratchpads of key findings), file-system-mediated context (writing intermediate results to files instead of keeping them in the window), and intelligent tool selection to minimize context consumption. Anthropic's context engineering guide describes these as core concerns for any serious agent deployment. The nano harness only demonstrates the simplest approach: hard character limits and hoping the agent reads strategically. | |
| ## Design Decisions | |
| Python is precise where JSON or free-form text is ambiguous, so the agent outputs code. `safe_path()` resolves and validates every path against the workspace root to prevent directory traversal. Only an explicit allowlist of shell commands can run. A hard step limit and per-call output limit bound both runtime and context growth. And because exceptions are turned back into observations, the agent adapts instead of crashing. | |
| ## Key Takeaways | |
| Call the LLM, parse its code, execute it with tools, observe, repeat. The system prompt defines tools, constraints, and the termination signal. Sandboxing happens at the tool boundary through path confinement, command allowlists, and size limits. Errors become observations and the full message history serves as memory. | |
| Next, tools and sandboxing in more detail. | |
Xet Storage Details
- Size:
- 5.1 kB
- Xet hash:
- 55280cbc15a0e1f768af7d88a9f89a58164c3a15aace5d6d9bb2405e23a0615c
·
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.