Buckets:
The Agentic Loop Deep Dive
The core of nano_harness is a loop that runs at most fifty times: call the LLM, parse its code, execute it, observe results, repeat.
The Loop Visually
┌─────────────────────────────────────────────┐
│ 1. Initialize: task + system prompt │
│ message_history = [system_msg] │
│ step = 0 │
└──────────────┬──────────────────────────────┘
│
▼
┌──────────────────────┐
│ 2. Call LLM │
│ with message_hist │
│ (max_tokens: 4096)│
└──────┬───────────────┘
│
▼
┌──────────────────────┐
│ 3. Parse output │
│ Extract Python │
│ code blocks │
└──────┬───────────────┘
│
▼
┌──────────────────────┐
│ 4. Execute Python │
│ Capture stdout/ │
│ stderr │
└──────┬───────────────┘
│
▼
┌──────────────────────┐
│ 5. Check: │
│ final_answer() │
│ called? │
└──────┬───────────────┘
│ │
No Yes → Done!
│
▼
┌──────────────────────┐
│ 6. Append result │
│ to message_hist │
│ step += 1 │
└──────┬───────────────┘
│
▼
┌──────────────────────┐
│ 7. step [!NOTE]
> **This is a simplification.** The nano harness treats memory as a flat conversation history — every prior turn stays in context until the window fills up. Production agent systems use much richer memory architectures: short-term scratchpads for working state, episodic memory for recalling past sessions, semantic memory for persistent knowledge, retrieval-augmented approaches that fetch relevant memories on demand, and compaction strategies that summarize older context to free up space. If you want to go deeper, look into the research on agent memory systems (e.g., MemoryAgentBench, A-MEM) and context compression (e.g., ACON). The nano harness is a teaching tool — it shows the minimal viable loop, not the full picture.
## Context Management
With MAX_TOKENS=4096 and MAX_CHARS=8000:
```python
# Good: Agent reads one file at a time
read_file("test.py", max_chars=2000) # 2000 chars ✓
# Bad: Agent tries to read entire codebase at once
read_file("large_codebase.py", max_chars=50000) # Gets clipped to 8000
The agent learns to read strategically to stay within limits.
Context management in production is a major topic. Real code agents implement compaction (summarizing earlier context), structured note-taking (maintaining scratchpads of key findings), file-system-mediated context (writing intermediate results to files instead of keeping them in the window), and intelligent tool selection to minimize context consumption. Anthropic's context engineering guide describes these as core concerns for any serious agent deployment. The nano harness only demonstrates the simplest approach: hard character limits and hoping the agent reads strategically.
Design Decisions
Python is precise where JSON or free-form text is ambiguous, so the agent outputs code. safe_path() resolves and validates every path against the workspace root to prevent directory traversal. Only an explicit allowlist of shell commands can run. A hard step limit and per-call output limit bound both runtime and context growth. And because exceptions are turned back into observations, the agent adapts instead of crashing.
Key Takeaways
Call the LLM, parse its code, execute it with tools, observe, repeat. The system prompt defines tools, constraints, and the termination signal. Sandboxing happens at the tool boundary through path confinement, command allowlists, and size limits. Errors become observations and the full message history serves as memory.
Next, tools and sandboxing in more detail.
Xet Storage Details
- Size:
- 5.1 kB
- Xet hash:
- 55280cbc15a0e1f768af7d88a9f89a58164c3a15aace5d6d9bb2405e23a0615c
Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.