Spaces:

curtburk
/

HP-mini-developer-guide-NemoClaw

Running

App Files Files Community

HP-mini-developer-guide-NemoClaw / index.html

curtburk

Update index.html

069b8b8 verified about 2 months ago

raw

history blame contribute delete

19.5 kB

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>NemoClaw Developer Guide</title>
	<link rel="preconnect" href="https://fonts.googleapis.com">
	<link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@300;400;500;600;700&family=IBM+Plex+Mono:wght@400;500&display=swap" rel="stylesheet">
	<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/github-dark.min.css">
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/bash.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/python.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/yaml.min.js"></script>
	<script src="https://cdnjs.cloudflare.com/ajax/libs/marked/12.0.0/marked.min.js"></script>
	<style>
	:root {
	--bg: #0f1117;
	--bg-card: #161b22;
	--bg-code: #1c2129;
	--border: #30363d;
	--text: #e6edf3;
	--text-secondary: #8b949e;
	--text-muted: #6e7681;
	--accent: #58a6ff;
	--accent-green: #3fb950;
	--accent-orange: #d29922;
	--accent-red: #f85149;
	--accent-purple: #bc8cff;
	}
	* { margin: 0; padding: 0; box-sizing: border-box; }
	body {
	font-family: 'IBM Plex Sans', -apple-system, sans-serif;
	background: var(--bg);
	color: var(--text);
	line-height: 1.7;
	-webkit-font-smoothing: antialiased;
	}
	.logo-bar {
	display: flex;
	align-items: center;
	justify-content: center;
	gap: 40px;
	padding: 32px 24px 16px;
	flex-wrap: wrap;
	}
	.logo-bar img {
	height: 50px;
	object-fit: contain;
	filter: brightness(0.95);
	transition: filter 0.2s;
	}
	.logo-bar img:hover {
	filter: brightness(1.1);
	}
	.logo-disclaimer {
	text-align: center;
	font-size: 11px;
	color: var(--text-muted);
	padding: 0 24px 24px;
	font-style: italic;
	}
	.container {
	max-width: 860px;
	margin: 0 auto;
	padding: 0 24px 80px;
	}
	.content h1 {
	font-size: 32px;
	font-weight: 700;
	margin: 0 0 8px;
	padding-bottom: 12px;
	border-bottom: 1px solid var(--border);
	color: var(--text);
	}
	.content h2 {
	font-size: 22px;
	font-weight: 600;
	margin: 48px 0 16px;
	padding-bottom: 8px;
	border-bottom: 1px solid var(--border);
	color: var(--text);
	}
	.content h3 {
	font-size: 17px;
	font-weight: 600;
	margin: 32px 0 12px;
	color: var(--text);
	}
	.content p {
	margin: 0 0 16px;
	color: var(--text-secondary);
	font-size: 15px;
	}
	.content a {
	color: var(--accent);
	text-decoration: none;
	}
	.content a:hover {
	text-decoration: underline;
	}
	.content strong {
	color: var(--text);
	font-weight: 600;
	}
	.content em {
	color: var(--text-secondary);
	}
	.content ul, .content ol {
	margin: 0 0 16px 24px;
	color: var(--text-secondary);
	font-size: 15px;
	}
	.content li {
	margin-bottom: 6px;
	}
	.content hr {
	border: none;
	border-top: 1px solid var(--border);
	margin: 40px 0;
	}
	.content pre {
	background: var(--bg-code);
	border: 1px solid var(--border);
	border-radius: 8px;
	padding: 16px;
	margin: 0 0 16px;
	overflow-x: auto;
	font-size: 13px;
	line-height: 1.6;
	}
	.content pre code {
	font-family: 'IBM Plex Mono', monospace;
	background: none;
	padding: 0;
	border: none;
	font-size: 13px;
	color: var(--text);
	}
	.content code {
	font-family: 'IBM Plex Mono', monospace;
	background: var(--bg-code);
	border: 1px solid var(--border);
	border-radius: 4px;
	padding: 2px 6px;
	font-size: 13px;
	color: var(--accent);
	}
	.content blockquote {
	border-left: 3px solid var(--accent);
	margin: 0 0 16px;
	padding: 8px 16px;
	color: var(--text-secondary);
	background: rgba(88, 166, 255, 0.05);
	border-radius: 0 6px 6px 0;
	}
	.content table {
	width: 100%;
	border-collapse: collapse;
	margin: 0 0 16px;
	font-size: 14px;
	}
	.content th {
	background: var(--bg-code);
	border: 1px solid var(--border);
	padding: 10px 14px;
	text-align: left;
	font-weight: 600;
	color: var(--text);
	}
	.content td {
	border: 1px solid var(--border);
	padding: 10px 14px;
	color: var(--text-secondary);
	}
	.content p strong:first-child {
	color: var(--accent-orange);
	}
	.footer {
	text-align: center;
	padding: 40px 24px;
	border-top: 1px solid var(--border);
	margin-top: 60px;
	font-size: 13px;
	color: var(--text-muted);
	}
	.footer a {
	color: var(--accent);
	text-decoration: none;
	}
	@media (max-width: 640px) {
	.logo-bar { gap: 24px; }
	.logo-bar img { height: 36px; }
	.content h1 { font-size: 24px; }
	.content h2 { font-size: 18px; }
	}
	</style>
	</head>
	<body>

	<div class="logo-bar">
	<img src="images/Z by HP NVIDIA WHT-01.png" alt="NVIDIA">
	</div>
	<div class="logo-disclaimer">
	Company logos are used for identification purposes only and do not imply endorsement or official partnership unless otherwise stated.
	</div>

	<div class="container">
	<div class="content" id="content"></div>
	</div>

	<script>
	const markdown = `# Agentic AI with NemoClaw — Developer Guide

	Building Multi-Agent Pipelines on the HP ZGX Nano AI Station

	Written and prepared by Curtis Burkhalter, Ph.D. \| Technical Product Marketing Manager, AI Solutions at HP

	To connect with Curtis Burkhalter, Ph.D., find him at https://www.linkedin.com/in/curtburk/

	---

	## What This Guide Covers

	This unofficial developer guide documents how to build a multi-agent AI pipeline using NVIDIA NemoClaw and OpenShell on the HP ZGX Nano AI Station. It covers the real-world setup, the problems you could encounter, and the solutions that actually work - not the theory, the practice.

	The reference implementation is a four-agent "Autonomous Software Factory" that takes a plain-English specification and produces a working city infrastructure health dashboard. Each agent runs in an isolated NemoClaw sandbox backed by local LLM inference via Ollama.

	Pipeline: Architect → Coder → Reviewer → Analyst

	Hardware: HP ZGX Nano AI Station (NVIDIA GB10 Grace Blackwell, ARM64, 128GB unified memory)

	Models: Qwen3-32B (Architect), Qwen3-Coder-30B-A3B (Coder, Reviewer, Analyst)

	Source: [https://github.com/curtburk/NemoClaw_Autonomous_AI_Factory](https://github.com/curtburk/NemoClaw_Autonomous_AI_Factory)

	---

	## Prerequisites

	- HP ZGX Nano AI Station (or any NVIDIA GPU system with 64GB+ memory)
	- NemoClaw installed (\`nemoclaw --help\` responds)
	- OpenShell CLI installed (\`openshell --help\` responds)
	- Ollama installed with your models pulled
	- Python 3.10+ on the host

	---

	## 1. NemoClaw Setup

	### 1.1 Onboarding a Sandbox

	\`\`\`bash
	nemoclaw onboard
	\`\`\`

	The wizard walks you through provider configuration and sandbox creation. Choose "Local Ollama" when prompted. You'll hit a sandbox policy parsing error on the current version (v0.1.0):

	\`\`\`
	Error: failed to parse sandbox policy YAML
	network_policies.name: invalid type: string "github", expected struct
	\`\`\`

	Workaround: The image builds successfully — the error is only on sandbox creation. Grab the image tag from the build output and create manually:

	\`\`\`bash
	openshell sandbox create --name architect --from openshell/sandbox-from:<IMAGE_TAG>
	\`\`\`

	### 1.2 Creating Multiple Sandboxes

	NemoClaw's onboarding creates one sandbox at a time. For a multi-agent pipeline, run \`nemoclaw onboard\` once per agent, then use the image reuse trick for agents that share the same model:

	\`\`\`bash
	# Build image once via onboard (select qwen3-coder:latest)
	nemoclaw onboard # → name it "coder", note the image tag

	# Reuse that image for reviewer and analyst
	openshell sandbox create --name reviewer --from openshell/sandbox-from:<SAME_TAG>
	openshell sandbox create --name analyst --from openshell/sandbox-from:<SAME_TAG>
	\`\`\`

	### 1.3 Verify Sandbox Connectivity

	\`\`\`bash
	ssh -o BatchMode=yes openshell-architect "echo OK"
	ssh -o BatchMode=yes openshell-coder "echo OK"
	ssh -o BatchMode=yes openshell-reviewer "echo OK"
	ssh -o BatchMode=yes openshell-analyst "echo OK"
	\`\`\`

	---

	## 2. Model Configuration

	### 2.1 Using Different Models Per Sandbox

	The gateway inference route is global — one model for all sandboxes. But each sandbox's \`openclaw.json\` specifies which model to request, and Ollama serves whichever model the request asks for. So you can run different models per agent even with a single provider.

	To set this up, run \`nemoclaw onboard\` separately for the sandbox that needs a different model (e.g., Architect with \`qwen3:32b\`), then use \`qwen3-coder:latest\` for the rest.

	### 2.2 The maxTokens Problem

	Critical: NemoClaw hardcodes \`maxTokens: 4096\` in the Dockerfile that generates \`openclaw.json\`. This limits model output to ~3,000 characters — too short for code generation.

	Fix: Patch the Dockerfile before building sandbox images:

	\`\`\`bash
	sed -i "s/'maxTokens': 4096/'maxTokens': 16384/" \\
	~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile
	\`\`\`

	Verify:

	\`\`\`bash
	grep "maxTokens" ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile
	\`\`\`

	Then rebuild your sandboxes. The path may differ based on your Node.js version — use \`which nemoclaw\` to find it.

	### 2.3 Model Output Limits

	Even with \`maxTokens: 16384\`, Qwen3-Coder-30B-A3B naturally stops generating at ~2,000 characters. This is a model behavior, not a configuration issue. Design your prompts accordingly — see Section 4.

	### 2.4 Thinking Tokens

	Qwen3-Coder uses internal reasoning tokens by default. These consume output budget without producing visible output. Always pass \`--thinking off\`:

	\`\`\`bash
	openclaw agent --agent main --local --thinking off --session-id my-session -m "your prompt"
	\`\`\`

	---

	## 3. Orchestrator Architecture

	### 3.1 Sending Prompts to Sandboxes

	\`openclaw agent\` requires \`-m <text>\` — it does not read from stdin. Long prompts with special characters break shell escaping. The solution: pipe the prompt into a file inside the sandbox, then use command substitution.

	\`\`\`python
	# Step 1: Upload prompt via stdin → file
	upload_cmd = ["ssh", ssh_host, "cat > /tmp/prompt.txt"]
	subprocess.run(upload_cmd, input=prompt, text=True)

	# Step 2: Run agent with file content
	agent_cmd = ["ssh", ssh_host,
	'openclaw agent --agent main --local --thinking off '
	'--session-id my-session -m "$(cat /tmp/prompt.txt)"']
	subprocess.run(agent_cmd, capture_output=True, text=True)
	\`\`\`

	### 3.2 Session ID Management

	Use unique session IDs per call. OpenClaw accumulates conversation history within a session. If the Coder retries three times with the same session ID, the context window fills with prior failed attempts, squeezing out output tokens.

	\`\`\`python
	import uuid
	call_session = f"pipeline-{agent_name}-{uuid.uuid4().hex[:8]}"
	\`\`\`

	### 3.3 Cleaning Model Output

	The model frequently emits OpenClaw tool-call artifacts in its output:

	\`\`\`
	</parameter>
	<parameter=file_path>
	/sandbox/.openclaw/workspace/file.js
	</parameter>
	</function>
	</tool_call>
	\`\`\`

	Strip these before processing:

	\`\`\`python
	import re
	output = re.sub(r'</parameter>.*', '', output, flags=re.DOTALL)
	output = re.sub(r'<parameter[^>]>.', '', output, flags=re.DOTALL)
	output = re.sub(r'</function>.*', '', output, flags=re.DOTALL)
	output = re.sub(r'</tool_call>.*', '', output, flags=re.DOTALL)
	\`\`\`

	### 3.4 Streaming Output to a Live UI

	When launching the orchestrator as a subprocess, Python buffers stdout. Force immediate flushing:

	\`\`\`python
	# In the orchestrator — every print() call
	print(line, flush=True)

	# In the UI server — launch with unbuffered flag
	env = dict(os.environ, PYTHONUNBUFFERED="1")
	subprocess.Popen([sys.executable, "-u", "orchestrator.py"], env=env)
	\`\`\`

	---

	## 4. Prompt Engineering for Small Models

	The most important lesson from this project: a 3B-active-parameter MoE (Qwen3-Coder-30B-A3B) requires fundamentally different prompting than a frontier model.

	### 4.1 Be Prescriptive, Not Descriptive

	Don't ask the model to design. Tell it exactly what to produce.

	Bad: "Generate sensor data with realistic values for a city infrastructure system"

	Good: "voltage: random.uniform(115, 125). NOT 220-240. NOT European voltage."

	The model will substitute its training priors for your specifications unless you're explicit — and sometimes even then. Including "NOT X" alongside "use Y" is more reliable than "use Y" alone.

	### 4.2 Provide Skeletons, Not Specs

	Don't ask the model to write 120 lines from scratch. Give it 80 lines of skeleton with TODOs to fill in. The model contributes 15-30 lines of logic instead of generating boilerplate that blows the token budget.

	\`\`\`
	Complete this skeleton. Fill in the TODO sections.
	Do NOT modify the existing code. Do NOT add Pydantic models.

	\`\`\`python
	def generate_sensors():
	data = {}
	for d in DISTRICTS:
	power = {"voltage": round(random.uniform(115,125),1), ...}
	# TODO: Override anomaly districts
	data[d] = {"power": power, "water": water, "traffic": traffic}
	return data

	@app.get("/api/health")
	def health():
	# TODO: compute overall status
	pass
	\`\`\`
	\`\`\`

	### 4.3 Binary Reviews, Not Open-Ended Critique

	A 3B-active model cannot reliably do nuanced code review. It will flag correct code as wrong if the review criteria are ambiguous. Reduce the Reviewer to binary checks:

	1. Are all three endpoints present?
	2. Do they return data (not \`pass\`)?
	3. Is CORS enabled?
	4. Does uvicorn bind to the right port?
	5. Any syntax errors?

	### 4.4 Template Injection for Large Outputs

	The model caps output at ~2,000 characters. A full HTML dashboard with Chart.js is 5,000-8,000 characters. Don't fight the limit — work with it.

	Have the model generate only the dynamic part (a \`generateData()\` function), then inject it into a pre-built template:

	\`\`\`python
	template = open("dashboard_template.html").read()
	gen_func = run_agent("analyst", "Output ONLY a generateData() function...")
	dashboard = template.replace("%%GENERATE_DATA%%", gen_func)
	\`\`\`

	### 4.5 Always Start with "Output ONLY..."

	Every prompt to \`openclaw agent\` should begin with a forceful instruction:

	\`\`\`
	Output ONLY raw HTML. No explanation. No markdown. Start with <!DOCTYPE html>.
	\`\`\`

	\`\`\`
	Output ONLY a Python code block. Start with \`\`\`python end with \`\`\`. No explanation.
	\`\`\`

	Without this, the agent wraps its output in conversational prose, markdown formatting, or tool-call XML.

	---

	## 5. Lessons Learned

	\| Problem \| Root Cause \| Solution \|
	\|---------\|-----------\|----------\|
	\| Model ignores explicit values \| Training priors override prompt \| Include "NOT X" alongside "use Y" \|
	\| Output truncated mid-code \| maxTokens: 4096 hardcoded \| Patch NemoClaw Dockerfile \|
	\| Output still truncated \| Thinking tokens consume budget \| \`--thinking off\` on every call \|
	\| Output still short (~2K chars) \| Model's natural stop behavior \| Skeleton prompts, template injection \|
	\| Retries produce worse output \| Session context accumulates \| Unique session ID per call \|
	\| Tool-call XML in output \| OpenClaw agent framework \| Regex cleanup post-processing \|
	\| Reviewer rejects valid code \| Reviewing against drifted plan \| Binary checks, not schema comparison \|
	\| Sandbox creation fails \| Policy YAML parsing bug \| Create manually via \`openshell sandbox create\` \|
	\| Live UI shows no output \| Python stdout buffering \| \`flush=True\` + \`PYTHONUNBUFFERED=1\` \|
	\| Port 8080 in use \| NemoClaw gateway occupies it \| Use different port for UI server \|

	---

	## 6. Project Structure

	\`\`\`
	nemoclaw-demo/
	├── orchestrator.py # Main pipeline
	├── ui_server.py # Live browser UI (port 8888)
	├── prompts/
	│ ├── architect.txt # Spec → plan
	│ ├── coder_initial.txt # Plan → code (skeleton)
	│ ├── coder_retry.txt # Error → fixed code
	│ ├── reviewer.txt # Code → binary verdict
	│ ├── analyst.txt # → generateData() function
	│ └── dashboard_template.html # Pre-built dashboard UI
	├── fallback/
	│ ├── generated_app.py # Known-good backend
	│ └── dashboard.html # Known-good dashboard
	├── output/ # Pipeline output
	└── logs/ # Per-run logs
	\`\`\`

	---

	## 7. Running the Demo

	\`\`\`bash
	# Terminal 1: Start the live UI
	python3 ui_server.py

	# Open in browser (use the Network URL it prints)
	# Click RUN PIPELINE
	\`\`\`

	Or run the pipeline directly:

	\`\`\`bash
	python3 orchestrator.py
	# Open output/dashboard.html in a browser
	\`\`\`

	---

	## Security Architecture

	Each NemoClaw sandbox enforces:

	- Landlock filesystem restrictions — agents see only \`/sandbox\` and \`/tmp\`
	- seccomp system call filtering — blocked syscalls fail silently
	- Network namespace isolation — no direct egress, only \`inference.local\`
	- Inference routing — agents hit \`inference.local\`, the gateway routes to Ollama; the agent never sees host IPs or ports

	The orchestrator runs on the host as a trusted operator. Agents never communicate directly — every artifact passes through the host-side orchestrator. This is least-privilege at the agent level.

	---

	Last updated: April 2026
	NemoClaw v0.1.0 \| OpenShell v0.0.16 \| OpenClaw 2026.3.11
	`;
	const content = document.getElementById('content');
	content.innerHTML = marked.parse(markdown);
	document.querySelectorAll('pre code').forEach((block) => {
	hljs.highlightElement(block);
	});
	</script>
	</body>
	</html>