curtburk's picture
Update index.html
069b8b8 verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>NemoClaw Developer Guide</title>
<link rel="preconnect" href="https://fonts.googleapis.com">
<link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@300;400;500;600;700&family=IBM+Plex+Mono:wght@400;500&display=swap" rel="stylesheet">
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/github-dark.min.css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/bash.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/python.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/yaml.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/marked/12.0.0/marked.min.js"></script>
<style>
:root {
--bg: #0f1117;
--bg-card: #161b22;
--bg-code: #1c2129;
--border: #30363d;
--text: #e6edf3;
--text-secondary: #8b949e;
--text-muted: #6e7681;
--accent: #58a6ff;
--accent-green: #3fb950;
--accent-orange: #d29922;
--accent-red: #f85149;
--accent-purple: #bc8cff;
}
* { margin: 0; padding: 0; box-sizing: border-box; }
body {
font-family: 'IBM Plex Sans', -apple-system, sans-serif;
background: var(--bg);
color: var(--text);
line-height: 1.7;
-webkit-font-smoothing: antialiased;
}
.logo-bar {
display: flex;
align-items: center;
justify-content: center;
gap: 40px;
padding: 32px 24px 16px;
flex-wrap: wrap;
}
.logo-bar img {
height: 50px;
object-fit: contain;
filter: brightness(0.95);
transition: filter 0.2s;
}
.logo-bar img:hover {
filter: brightness(1.1);
}
.logo-disclaimer {
text-align: center;
font-size: 11px;
color: var(--text-muted);
padding: 0 24px 24px;
font-style: italic;
}
.container {
max-width: 860px;
margin: 0 auto;
padding: 0 24px 80px;
}
.content h1 {
font-size: 32px;
font-weight: 700;
margin: 0 0 8px;
padding-bottom: 12px;
border-bottom: 1px solid var(--border);
color: var(--text);
}
.content h2 {
font-size: 22px;
font-weight: 600;
margin: 48px 0 16px;
padding-bottom: 8px;
border-bottom: 1px solid var(--border);
color: var(--text);
}
.content h3 {
font-size: 17px;
font-weight: 600;
margin: 32px 0 12px;
color: var(--text);
}
.content p {
margin: 0 0 16px;
color: var(--text-secondary);
font-size: 15px;
}
.content a {
color: var(--accent);
text-decoration: none;
}
.content a:hover {
text-decoration: underline;
}
.content strong {
color: var(--text);
font-weight: 600;
}
.content em {
color: var(--text-secondary);
}
.content ul, .content ol {
margin: 0 0 16px 24px;
color: var(--text-secondary);
font-size: 15px;
}
.content li {
margin-bottom: 6px;
}
.content hr {
border: none;
border-top: 1px solid var(--border);
margin: 40px 0;
}
.content pre {
background: var(--bg-code);
border: 1px solid var(--border);
border-radius: 8px;
padding: 16px;
margin: 0 0 16px;
overflow-x: auto;
font-size: 13px;
line-height: 1.6;
}
.content pre code {
font-family: 'IBM Plex Mono', monospace;
background: none;
padding: 0;
border: none;
font-size: 13px;
color: var(--text);
}
.content code {
font-family: 'IBM Plex Mono', monospace;
background: var(--bg-code);
border: 1px solid var(--border);
border-radius: 4px;
padding: 2px 6px;
font-size: 13px;
color: var(--accent);
}
.content blockquote {
border-left: 3px solid var(--accent);
margin: 0 0 16px;
padding: 8px 16px;
color: var(--text-secondary);
background: rgba(88, 166, 255, 0.05);
border-radius: 0 6px 6px 0;
}
.content table {
width: 100%;
border-collapse: collapse;
margin: 0 0 16px;
font-size: 14px;
}
.content th {
background: var(--bg-code);
border: 1px solid var(--border);
padding: 10px 14px;
text-align: left;
font-weight: 600;
color: var(--text);
}
.content td {
border: 1px solid var(--border);
padding: 10px 14px;
color: var(--text-secondary);
}
.content p strong:first-child {
color: var(--accent-orange);
}
.footer {
text-align: center;
padding: 40px 24px;
border-top: 1px solid var(--border);
margin-top: 60px;
font-size: 13px;
color: var(--text-muted);
}
.footer a {
color: var(--accent);
text-decoration: none;
}
@media (max-width: 640px) {
.logo-bar { gap: 24px; }
.logo-bar img { height: 36px; }
.content h1 { font-size: 24px; }
.content h2 { font-size: 18px; }
}
</style>
</head>
<body>
<div class="logo-bar">
<img src="images/Z by HP NVIDIA WHT-01.png" alt="NVIDIA">
</div>
<div class="logo-disclaimer">
Company logos are used for identification purposes only and do not imply endorsement or official partnership unless otherwise stated.
</div>
<div class="container">
<div class="content" id="content"></div>
</div>
<script>
const markdown = `# Agentic AI with NemoClaw β€” Developer Guide
**Building Multi-Agent Pipelines on the HP ZGX Nano AI Station**
*Written and prepared by Curtis Burkhalter, Ph.D. | Technical Product Marketing Manager, AI Solutions at HP*
*To connect with Curtis Burkhalter, Ph.D., find him at https://www.linkedin.com/in/curtburk/*
---
## What This Guide Covers
This unofficial developer guide documents how to build a multi-agent AI pipeline using NVIDIA NemoClaw and OpenShell on the HP ZGX Nano AI Station. It covers the real-world setup, the problems you could encounter, and the solutions that actually work - not the theory, the practice.
The reference implementation is a four-agent "Autonomous Software Factory" that takes a plain-English specification and produces a working city infrastructure health dashboard. Each agent runs in an isolated NemoClaw sandbox backed by local LLM inference via Ollama.
**Pipeline:** Architect β†’ Coder β†’ Reviewer β†’ Analyst
**Hardware:** HP ZGX Nano AI Station (NVIDIA GB10 Grace Blackwell, ARM64, 128GB unified memory)
**Models:** Qwen3-32B (Architect), Qwen3-Coder-30B-A3B (Coder, Reviewer, Analyst)
**Source:** [https://github.com/curtburk/NemoClaw_Autonomous_AI_Factory](https://github.com/curtburk/NemoClaw_Autonomous_AI_Factory)
---
## Prerequisites
- HP ZGX Nano AI Station (or any NVIDIA GPU system with 64GB+ memory)
- NemoClaw installed (\`nemoclaw --help\` responds)
- OpenShell CLI installed (\`openshell --help\` responds)
- Ollama installed with your models pulled
- Python 3.10+ on the host
---
## 1. NemoClaw Setup
### 1.1 Onboarding a Sandbox
\`\`\`bash
nemoclaw onboard
\`\`\`
The wizard walks you through provider configuration and sandbox creation. Choose "Local Ollama" when prompted. You'll hit a **sandbox policy parsing error** on the current version (v0.1.0):
\`\`\`
Error: failed to parse sandbox policy YAML
network_policies.name: invalid type: string "github", expected struct
\`\`\`
**Workaround:** The image builds successfully β€” the error is only on sandbox creation. Grab the image tag from the build output and create manually:
\`\`\`bash
openshell sandbox create --name architect --from openshell/sandbox-from:<IMAGE_TAG>
\`\`\`
### 1.2 Creating Multiple Sandboxes
NemoClaw's onboarding creates one sandbox at a time. For a multi-agent pipeline, run \`nemoclaw onboard\` once per agent, then use the image reuse trick for agents that share the same model:
\`\`\`bash
# Build image once via onboard (select qwen3-coder:latest)
nemoclaw onboard # β†’ name it "coder", note the image tag
# Reuse that image for reviewer and analyst
openshell sandbox create --name reviewer --from openshell/sandbox-from:<SAME_TAG>
openshell sandbox create --name analyst --from openshell/sandbox-from:<SAME_TAG>
\`\`\`
### 1.3 Verify Sandbox Connectivity
\`\`\`bash
ssh -o BatchMode=yes openshell-architect "echo OK"
ssh -o BatchMode=yes openshell-coder "echo OK"
ssh -o BatchMode=yes openshell-reviewer "echo OK"
ssh -o BatchMode=yes openshell-analyst "echo OK"
\`\`\`
---
## 2. Model Configuration
### 2.1 Using Different Models Per Sandbox
The gateway inference route is global β€” one model for all sandboxes. But each sandbox's \`openclaw.json\` specifies which model to request, and Ollama serves whichever model the request asks for. So you can run different models per agent even with a single provider.
To set this up, run \`nemoclaw onboard\` separately for the sandbox that needs a different model (e.g., Architect with \`qwen3:32b\`), then use \`qwen3-coder:latest\` for the rest.
### 2.2 The maxTokens Problem
**Critical:** NemoClaw hardcodes \`maxTokens: 4096\` in the Dockerfile that generates \`openclaw.json\`. This limits model output to ~3,000 characters β€” too short for code generation.
**Fix:** Patch the Dockerfile before building sandbox images:
\`\`\`bash
sed -i "s/'maxTokens': 4096/'maxTokens': 16384/" \\
~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile
\`\`\`
Verify:
\`\`\`bash
grep "maxTokens" ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile
\`\`\`
Then rebuild your sandboxes. The path may differ based on your Node.js version β€” use \`which nemoclaw\` to find it.
### 2.3 Model Output Limits
Even with \`maxTokens: 16384\`, Qwen3-Coder-30B-A3B naturally stops generating at ~2,000 characters. This is a model behavior, not a configuration issue. Design your prompts accordingly β€” see Section 4.
### 2.4 Thinking Tokens
Qwen3-Coder uses internal reasoning tokens by default. These consume output budget without producing visible output. Always pass \`--thinking off\`:
\`\`\`bash
openclaw agent --agent main --local --thinking off --session-id my-session -m "your prompt"
\`\`\`
---
## 3. Orchestrator Architecture
### 3.1 Sending Prompts to Sandboxes
\`openclaw agent\` requires \`-m <text>\` β€” it does not read from stdin. Long prompts with special characters break shell escaping. The solution: pipe the prompt into a file inside the sandbox, then use command substitution.
\`\`\`python
# Step 1: Upload prompt via stdin β†’ file
upload_cmd = ["ssh", ssh_host, "cat > /tmp/prompt.txt"]
subprocess.run(upload_cmd, input=prompt, text=True)
# Step 2: Run agent with file content
agent_cmd = ["ssh", ssh_host,
'openclaw agent --agent main --local --thinking off '
'--session-id my-session -m "$(cat /tmp/prompt.txt)"']
subprocess.run(agent_cmd, capture_output=True, text=True)
\`\`\`
### 3.2 Session ID Management
**Use unique session IDs per call.** OpenClaw accumulates conversation history within a session. If the Coder retries three times with the same session ID, the context window fills with prior failed attempts, squeezing out output tokens.
\`\`\`python
import uuid
call_session = f"pipeline-{agent_name}-{uuid.uuid4().hex[:8]}"
\`\`\`
### 3.3 Cleaning Model Output
The model frequently emits OpenClaw tool-call artifacts in its output:
\`\`\`
</parameter>
<parameter=file_path>
/sandbox/.openclaw/workspace/file.js
</parameter>
</function>
</tool_call>
\`\`\`
Strip these before processing:
\`\`\`python
import re
output = re.sub(r'</parameter>.*', '', output, flags=re.DOTALL)
output = re.sub(r'<parameter[^>]*>.*', '', output, flags=re.DOTALL)
output = re.sub(r'</function>.*', '', output, flags=re.DOTALL)
output = re.sub(r'</tool_call>.*', '', output, flags=re.DOTALL)
\`\`\`
### 3.4 Streaming Output to a Live UI
When launching the orchestrator as a subprocess, Python buffers stdout. Force immediate flushing:
\`\`\`python
# In the orchestrator β€” every print() call
print(line, flush=True)
# In the UI server β€” launch with unbuffered flag
env = dict(os.environ, PYTHONUNBUFFERED="1")
subprocess.Popen([sys.executable, "-u", "orchestrator.py"], env=env)
\`\`\`
---
## 4. Prompt Engineering for Small Models
The most important lesson from this project: **a 3B-active-parameter MoE (Qwen3-Coder-30B-A3B) requires fundamentally different prompting than a frontier model.**
### 4.1 Be Prescriptive, Not Descriptive
Don't ask the model to design. Tell it exactly what to produce.
**Bad:** "Generate sensor data with realistic values for a city infrastructure system"
**Good:** "voltage: random.uniform(115, 125). NOT 220-240. NOT European voltage."
The model will substitute its training priors for your specifications unless you're explicit β€” and sometimes even then. Including "NOT X" alongside "use Y" is more reliable than "use Y" alone.
### 4.2 Provide Skeletons, Not Specs
Don't ask the model to write 120 lines from scratch. Give it 80 lines of skeleton with TODOs to fill in. The model contributes 15-30 lines of logic instead of generating boilerplate that blows the token budget.
\`\`\`
Complete this skeleton. Fill in the TODO sections.
Do NOT modify the existing code. Do NOT add Pydantic models.
\`\`\`python
def generate_sensors():
data = {}
for d in DISTRICTS:
power = {"voltage": round(random.uniform(115,125),1), ...}
# TODO: Override anomaly districts
data[d] = {"power": power, "water": water, "traffic": traffic}
return data
@app.get("/api/health")
def health():
# TODO: compute overall status
pass
\`\`\`
\`\`\`
### 4.3 Binary Reviews, Not Open-Ended Critique
A 3B-active model cannot reliably do nuanced code review. It will flag correct code as wrong if the review criteria are ambiguous. Reduce the Reviewer to binary checks:
1. Are all three endpoints present?
2. Do they return data (not \`pass\`)?
3. Is CORS enabled?
4. Does uvicorn bind to the right port?
5. Any syntax errors?
### 4.4 Template Injection for Large Outputs
The model caps output at ~2,000 characters. A full HTML dashboard with Chart.js is 5,000-8,000 characters. Don't fight the limit β€” work with it.
Have the model generate only the dynamic part (a \`generateData()\` function), then inject it into a pre-built template:
\`\`\`python
template = open("dashboard_template.html").read()
gen_func = run_agent("analyst", "Output ONLY a generateData() function...")
dashboard = template.replace("%%GENERATE_DATA%%", gen_func)
\`\`\`
### 4.5 Always Start with "Output ONLY..."
Every prompt to \`openclaw agent\` should begin with a forceful instruction:
\`\`\`
Output ONLY raw HTML. No explanation. No markdown. Start with <!DOCTYPE html>.
\`\`\`
\`\`\`
Output ONLY a Python code block. Start with \`\`\`python end with \`\`\`. No explanation.
\`\`\`
Without this, the agent wraps its output in conversational prose, markdown formatting, or tool-call XML.
---
## 5. Lessons Learned
| Problem | Root Cause | Solution |
|---------|-----------|----------|
| Model ignores explicit values | Training priors override prompt | Include "NOT X" alongside "use Y" |
| Output truncated mid-code | maxTokens: 4096 hardcoded | Patch NemoClaw Dockerfile |
| Output still truncated | Thinking tokens consume budget | \`--thinking off\` on every call |
| Output still short (~2K chars) | Model's natural stop behavior | Skeleton prompts, template injection |
| Retries produce worse output | Session context accumulates | Unique session ID per call |
| Tool-call XML in output | OpenClaw agent framework | Regex cleanup post-processing |
| Reviewer rejects valid code | Reviewing against drifted plan | Binary checks, not schema comparison |
| Sandbox creation fails | Policy YAML parsing bug | Create manually via \`openshell sandbox create\` |
| Live UI shows no output | Python stdout buffering | \`flush=True\` + \`PYTHONUNBUFFERED=1\` |
| Port 8080 in use | NemoClaw gateway occupies it | Use different port for UI server |
---
## 6. Project Structure
\`\`\`
nemoclaw-demo/
β”œβ”€β”€ orchestrator.py # Main pipeline
β”œβ”€β”€ ui_server.py # Live browser UI (port 8888)
β”œβ”€β”€ prompts/
β”‚ β”œβ”€β”€ architect.txt # Spec β†’ plan
β”‚ β”œβ”€β”€ coder_initial.txt # Plan β†’ code (skeleton)
β”‚ β”œβ”€β”€ coder_retry.txt # Error β†’ fixed code
β”‚ β”œβ”€β”€ reviewer.txt # Code β†’ binary verdict
β”‚ β”œβ”€β”€ analyst.txt # β†’ generateData() function
β”‚ └── dashboard_template.html # Pre-built dashboard UI
β”œβ”€β”€ fallback/
β”‚ β”œβ”€β”€ generated_app.py # Known-good backend
β”‚ └── dashboard.html # Known-good dashboard
β”œβ”€β”€ output/ # Pipeline output
└── logs/ # Per-run logs
\`\`\`
---
## 7. Running the Demo
\`\`\`bash
# Terminal 1: Start the live UI
python3 ui_server.py
# Open in browser (use the Network URL it prints)
# Click RUN PIPELINE
\`\`\`
Or run the pipeline directly:
\`\`\`bash
python3 orchestrator.py
# Open output/dashboard.html in a browser
\`\`\`
---
## Security Architecture
Each NemoClaw sandbox enforces:
- **Landlock** filesystem restrictions β€” agents see only \`/sandbox\` and \`/tmp\`
- **seccomp** system call filtering β€” blocked syscalls fail silently
- **Network namespace isolation** β€” no direct egress, only \`inference.local\`
- **Inference routing** β€” agents hit \`inference.local\`, the gateway routes to Ollama; the agent never sees host IPs or ports
The orchestrator runs on the host as a trusted operator. Agents never communicate directly β€” every artifact passes through the host-side orchestrator. This is least-privilege at the agent level.
---
*Last updated: April 2026*
*NemoClaw v0.1.0 | OpenShell v0.0.16 | OpenClaw 2026.3.11*
`;
const content = document.getElementById('content');
content.innerHTML = marked.parse(markdown);
document.querySelectorAll('pre code').forEach((block) => {
hljs.highlightElement(block);
});
</script>
</body>
</html>