Buckets:

hf-doc-build
/

doc-dev

Files

xet

hf-doc-build/doc-dev / context-course /pr_7 /en /unit6 /agent-loop.md

HuggingFaceDocBuilder

8 days ago

preview code

download

raw

11.7 kB

	# The Agentic Loop Deep Dive

	The core of nano_harness is a loop that runs at most fifty times: call the LLM, parse its code, execute it, observe results, repeat.

	<iframe
	src="https://context-course-agent-loop.static.hf.space"
	frameborder="0"
	width="850"
	height="450"
	>

	## Code Walkthrough

	### Configuration

	```python
	TASK = "Inspect the workspace and provide a summary."
	MODEL = os.getenv("NANO_MODEL", "zai-org/GLM-5.1")
	BASE_URL = os.getenv("OPENAI_BASE_URL", "https://router.huggingface.co/v1")
	API_KEY = os.getenv("HF_TOKEN", "")
	WORKSPACE = str(Path.cwd())
	MAX_STEPS = 50
	TEMPERATURE = 0.2
	TIMEOUT_S = 30
	MAX_CHARS = 8000
	ALLOW_WRITE = False
	ALLOW_COMMANDS = ["ls", "cat", "pwd", "echo", "head", "tail", "wc", "rg"]
	```

	`MODEL` is the HF model ID routed through Inference Providers, `MAX_STEPS` caps iterations at 50, `TEMPERATURE=0.2` keeps output deterministic, `ALLOW_COMMANDS` is the shell allowlist, `ALLOW_WRITE=False` keeps file mutation off by default, and `MAX_CHARS` bounds tool output to prevent context overflow.

	### System Prompt

	```python
	SYSTEM_PROMPT = f"""You are a code-first agent.
	Output only executable Python code, no prose.

	Tools available:
	- list_dir(path='.'): List directory contents
	- read_file(path, max_chars=4000): Read file
	- write_file(path, content): Write file (only if ALLOW_WRITE=True)
	- exec_cmd(args): Run shell command

	When task is complete, call:
	final_answer(result)

	Constraints:
	- All file paths confined to workspace: {WORKSPACE}
	- Allowed commands: {ALLOW_COMMANDS}
	- Max output: {MAX_CHARS} chars
	- No markdown, no prose—only Python
	"""
	```

	The system prompt tells the model to output only Python code, lists the available tools, shows how to signal completion with `final_answer()`, and states the path, command, and size constraints.

	### Tool Definitions

	The tools are defined as Python functions that the model can call:

	```python
	def list_dir(path="."):
	"""List directory contents."""
	p = safe_path(path) # Ensure path is in workspace
	if not p.is_dir():
	raise NotADirectoryError(str(p))
	return sorted([x.name + ("/" if x.is_dir() else "") for x in p.iterdir()])

	def read_file(path, max_chars=4000):
	"""Read file with size limit."""
	p = safe_path(path)
	content = p.read_text(encoding="utf-8", errors="replace")
	return clip(content, min(max_chars, MAX_CHARS)) # Limit output

	def write_file(path, content):
	"""Write or create file if writes are enabled."""
	if not ALLOW_WRITE:
	raise PermissionError("write_file disabled")
	p = safe_path(path)
	p.parent.mkdir(parents=True, exist_ok=True)
	p.write_text(str(content), encoding="utf-8")
	return f"Wrote {len(str(content))} bytes"

	def exec_cmd(args):
	"""Execute shell command (whitelisted only)."""
	if args[0] not in ALLOW_COMMANDS:
	raise PermissionError(f"Command {args[0]} not allowed")
	result = subprocess.run(args, capture_output=True, timeout=TIMEOUT_S, text=True)
	output_parts = []
	if result.stdout:
	output_parts.append(f"stdout:\n{result.stdout}")
	if result.stderr:
	output_parts.append(f"stderr:\n{result.stderr}")
	output = "\n\n".join(output_parts) or f"(exit code {result.returncode} with no output)"
	return clip(output, MAX_CHARS)

	DONE = False
	FINAL_RESULT = None

	def final_answer(value):
	"""Agent calls this when task is complete."""
	global DONE, FINAL_RESULT
	DONE = True
	FINAL_RESULT = value
	return value
	```

	Each tool enforces safety at the boundary: path confinement, command allowlisting, output size limits, and explicit write gating.

	### The Main Loop

	The harness uses the Responses API. The OpenAI SDK accepts message-style `input` payloads and returns `response.output_text`, and the Hugging Face router exposes the same `/v1` surface via Inference Providers, so a single model ID (`zai-org/GLM-5.1`) drives the whole unit.

	```python
	def main():
	global DONE, FINAL_RESULT
	DONE = False
	FINAL_RESULT = None
	messages = [
	{"role": "system", "content": SYSTEM_PROMPT},
	{"role": "user", "content": TASK}
	]

	for step in range(MAX_STEPS):
	print(f"\n[Step {step + 1}]")

	# 1. Call LLM
	response = client.responses.create(
	model=MODEL,
	temperature=TEMPERATURE,
	input=messages
	)

	content = response.output_text
	print(f"Model output:\n{content[:500]}...")

	# 2. Add model response to history
	messages.append({"role": "assistant", "content": content})

	# 3. Parse and execute Python code
	code = extract_python(content) # Parse code block from response
	try:
	stdout_buffer = io.StringIO()
	stderr_buffer = io.StringIO()
	exec_globals = {
	"__builtins__": {},
	"list_dir": list_dir,
	"read_file": read_file,
	"write_file": write_file,
	"exec_cmd": exec_cmd,
	"final_answer": final_answer
	}
	with redirect_stdout(stdout_buffer), redirect_stderr(stderr_buffer):
	exec(code, exec_globals)

	stdout_text = stdout_buffer.getvalue().strip()
	stderr_text = stderr_buffer.getvalue().strip()

	if DONE:
	result = f"Final answer: {clip(FINAL_RESULT)}"
	else:
	observations = []
	if stdout_text:
	observations.append(f"stdout:\n{clip(stdout_text)}")
	if stderr_text:
	observations.append(f"stderr:\n{clip(stderr_text)}")
	result = "\n\n".join(observations) or "Executed successfully (no output)"
	except FileNotFoundError:
	result = "Error: FileNotFoundError: File not found"
	except PermissionError as e:
	result = f"Error: PermissionError: {str(e)}"
	except subprocess.TimeoutExpired:
	result = "Error: TimeoutError: Command took too long"
	except Exception as e:
	result = f"Error: {type(e).__name__}: {str(e)}"

	# 4. Check if agent called final_answer()
	if DONE:
	print(f"✓ Task complete: {FINAL_RESULT}")
	break

	# 5. Add observation to message history
	messages.append({"role": "user", "content": result})

	if not DONE:
	print(f"✗ Max steps ({MAX_STEPS}) reached without final_answer()")
	```

	Four things matter in this loop. The message history accumulates system prompt, user task, and alternating assistant/observation turns. The LLM call uses a configurable model and a low temperature (0.2) for deterministic behavior. Code execution runs through `exec()` with a restricted globals dict that strips builtins and exposes only the tool functions. The loop then feeds back one of four observation types: stdout, stderr, an explicit final answer, or a structured error string.

	## Error Recovery

	The main loop above already handles the common failure modes explicitly:

	```python
	try:
	with redirect_stdout(stdout_buffer), redirect_stderr(stderr_buffer):
	exec(code, exec_globals)
	except FileNotFoundError:
	result = "Error: FileNotFoundError: File not found"
	except PermissionError as e:
	result = f"Error: PermissionError: {str(e)}"
	except subprocess.TimeoutExpired:
	result = "Error: TimeoutError: Command took too long"
	except Exception as e:
	result = f"Error: {type(e).__name__}: {str(e)}"

	# The model sees this observation and tries a different approach
	messages.append({"role": "user", "content": result})
	```

	The agent reads the failure as an observation and adjusts: list the directory after a missing file, stop trying `write_file` when writes are disabled, swap a blocked command for a whitelisted one, or break long-running work into smaller steps.

	## Step Limit Safety

	```python
	for step in range(MAX_STEPS): # MAX_STEPS = 50
	# ... run loop ...
	if DONE:
	break

	if not DONE:
	print("Max steps reached without final_answer()")
	```

	The step limit guarantees termination. Simple tasks like "list files" finish in one to three steps; exploring a codebase typically takes five to ten; debugging something non-trivial can run fifteen to thirty.

	## Message History

	The message history is the agent's memory:

	```python
	messages = [
	{"role": "system", "content": "You are a code-first agent..."},
	{"role": "user", "content": "Inspect workspace and summarize"},
	{"role": "assistant", "content": "list_dir('.')\nread_file('README.md')"},
	{"role": "user", "content": "Found: ['README.md', 'src/', 'tests/']\n\nREADME.md contains:\n..."},
	{"role": "assistant", "content": "read_file('src/main.py')"},
	{"role": "user", "content": "src/main.py:\n..."},
	]
	```

	Every entry is a turn: system prompt, user task, then alternating agent code and observations. The LLM sees the whole history on each call, so it can refer back to earlier findings, avoid approaches that already failed, and build on what it learned.

	> [!NOTE]
	> This is a simplification. The nano harness treats memory as a flat conversation history — every prior turn stays in context until the window fills up. Production agent systems use much richer memory architectures: short-term scratchpads for working state, episodic memory for recalling past sessions, semantic memory for persistent knowledge, retrieval-augmented approaches that fetch relevant memories on demand, and compaction strategies that summarize older context to free up space. If you want to go deeper, look into the research on agent memory systems (e.g., MemoryAgentBench, A-MEM) and context compression (e.g., ACON). The nano harness is a teaching tool — it shows the minimal viable loop, not the full picture.

	## Context Management

	With MAX_TOKENS=4096 and MAX_CHARS=8000:

	```python
	# Good: Agent reads one file at a time
	read_file("test.py", max_chars=2000) # 2000 chars ✓

	# Bad: Agent tries to read entire codebase at once
	read_file("large_codebase.py", max_chars=50000) # Gets clipped to 8000
	```

	The agent learns to read strategically to stay within limits.

	> [!NOTE]
	> Context management in production is a major topic. Real code agents implement compaction (summarizing earlier context), structured note-taking (maintaining scratchpads of key findings), file-system-mediated context (writing intermediate results to files instead of keeping them in the window), and intelligent tool selection to minimize context consumption. Anthropic's context engineering guide describes these as core concerns for any serious agent deployment. The nano harness only demonstrates the simplest approach: hard character limits and hoping the agent reads strategically.

	## Design Decisions

	Python is precise where JSON or free-form text is ambiguous, so the agent outputs code. `safe_path()` resolves and validates every path against the workspace root to prevent directory traversal. Only an explicit allowlist of shell commands can run. A hard step limit and per-call output limit bound both runtime and context growth. And because exceptions are turned back into observations, the agent adapts instead of crashing.

	## Key Takeaways

	Call the LLM, parse its code, execute it with tools, observe, repeat. The system prompt defines tools, constraints, and the termination signal. Sandboxing happens at the tool boundary through path confinement, command allowlists, and size limits. Errors become observations and the full message history serves as memory.

	Next, tools and sandboxing in more detail.

Xet Storage Details

Size:: 11.7 kB
Xet hash:: 0926b18c9ca282f14dd3e692f96a78f0f94c477364fe57a5657923f791c673d3

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.