title: NemoClaw Developer Guide
emoji: π¦
colorFrom: blue
colorTo: green
sdk: static
pinned: false
![]() | ![]() |
Company logos are used for identification purposes only and do not imply endorsement or official partnership unless otherwise stated.
Agentic AI with NemoClaw β Developer Guide
Building Multi-Agent Pipelines on the HP ZGX Nano AI Station
Written and prepared by Curtis Burkhalter, Ph.D. | Technical Product Marketing Manager, AI Solutions at HP
What This Guide Covers
This guide documents how to build a multi-agent AI pipeline using NVIDIA NemoClaw and OpenShell on the HP ZGX Nano AI Station. It covers the real-world setup, the problems you'll hit, and the solutions that actually work β not the theory, the practice.
The reference implementation is a four-agent "Autonomous Software Factory" that takes a plain-English specification and produces a working city infrastructure health dashboard. Each agent runs in an isolated NemoClaw sandbox backed by local LLM inference via Ollama.
Pipeline: Architect β Coder β Reviewer β Analyst
Hardware: HP ZGX Nano AI Station (NVIDIA GB10 Grace Blackwell, ARM64, 128GB unified memory)
Models: Qwen3-32B (Architect), Qwen3-Coder-30B-A3B (Coder, Reviewer, Analyst)
Source: github.com/curtburk/nemoclaw-demo
Prerequisites
- HP ZGX Nano AI Station (or any NVIDIA GPU system with 64GB+ memory)
- NemoClaw installed (
nemoclaw --helpresponds) - OpenShell CLI installed (
openshell --helpresponds) - Ollama installed with your models pulled
- Python 3.10+ on the host
1. NemoClaw Setup
1.1 Onboarding a Sandbox
nemoclaw onboard
The wizard walks you through provider configuration and sandbox creation. Choose "Local Ollama" when prompted. You'll hit a sandbox policy parsing error on the current version (v0.1.0):
Error: failed to parse sandbox policy YAML
network_policies.name: invalid type: string "github", expected struct
Workaround: The image builds successfully β the error is only on sandbox creation. Grab the image tag from the build output and create manually:
openshell sandbox create --name architect --from openshell/sandbox-from:<IMAGE_TAG>
1.2 Creating Multiple Sandboxes
NemoClaw's onboarding creates one sandbox at a time. For a multi-agent pipeline, run nemoclaw onboard once per agent, then use the image reuse trick for agents that share the same model:
# Build image once via onboard (select qwen3-coder:latest)
nemoclaw onboard # β name it "coder", note the image tag
# Reuse that image for reviewer and analyst
openshell sandbox create --name reviewer --from openshell/sandbox-from:<SAME_TAG>
openshell sandbox create --name analyst --from openshell/sandbox-from:<SAME_TAG>
1.3 Verify Sandbox Connectivity
ssh -o BatchMode=yes openshell-architect "echo OK"
ssh -o BatchMode=yes openshell-coder "echo OK"
ssh -o BatchMode=yes openshell-reviewer "echo OK"
ssh -o BatchMode=yes openshell-analyst "echo OK"
2. Model Configuration
2.1 Using Different Models Per Sandbox
The gateway inference route is global β one model for all sandboxes. But each sandbox's openclaw.json specifies which model to request, and Ollama serves whichever model the request asks for. So you can run different models per agent even with a single provider.
To set this up, run nemoclaw onboard separately for the sandbox that needs a different model (e.g., Architect with qwen3:32b), then use qwen3-coder:latest for the rest.
2.2 The maxTokens Problem
Critical: NemoClaw hardcodes maxTokens: 4096 in the Dockerfile that generates openclaw.json. This limits model output to ~3,000 characters β too short for code generation.
Fix: Patch the Dockerfile before building sandbox images:
sed -i "s/'maxTokens': 4096/'maxTokens': 16384/" \
~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile
Verify:
grep "maxTokens" ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile
Then rebuild your sandboxes. The path may differ based on your Node.js version β use which nemoclaw to find it.
2.3 Model Output Limits
Even with maxTokens: 16384, Qwen3-Coder-30B-A3B naturally stops generating at ~2,000 characters. This is a model behavior, not a configuration issue. Design your prompts accordingly β see Section 4.
2.4 Thinking Tokens
Qwen3-Coder uses internal reasoning tokens by default. These consume output budget without producing visible output. Always pass --thinking off:
openclaw agent --agent main --local --thinking off --session-id my-session -m "your prompt"
3. Orchestrator Architecture
3.1 Sending Prompts to Sandboxes
openclaw agent requires -m <text> β it does not read from stdin. Long prompts with special characters break shell escaping. The solution: pipe the prompt into a file inside the sandbox, then use command substitution.
# Step 1: Upload prompt via stdin β file
upload_cmd = ["ssh", ssh_host, "cat > /tmp/prompt.txt"]
subprocess.run(upload_cmd, input=prompt, text=True)
# Step 2: Run agent with file content
agent_cmd = ["ssh", ssh_host,
'openclaw agent --agent main --local --thinking off '
'--session-id my-session -m "$(cat /tmp/prompt.txt)"']
subprocess.run(agent_cmd, capture_output=True, text=True)
3.2 Session ID Management
Use unique session IDs per call. OpenClaw accumulates conversation history within a session. If the Coder retries three times with the same session ID, the context window fills with prior failed attempts, squeezing out output tokens.
import uuid
call_session = f"pipeline-{agent_name}-{uuid.uuid4().hex[:8]}"
3.3 Cleaning Model Output
The model frequently emits OpenClaw tool-call artifacts in its output:
</parameter>
<parameter=file_path>
/sandbox/.openclaw/workspace/file.js
</parameter>
</function>
</tool_call>
Strip these before processing:
import re
output = re.sub(r'</parameter>.*', '', output, flags=re.DOTALL)
output = re.sub(r'<parameter[^>]*>.*', '', output, flags=re.DOTALL)
output = re.sub(r'</function>.*', '', output, flags=re.DOTALL)
output = re.sub(r'</tool_call>.*', '', output, flags=re.DOTALL)
3.4 Streaming Output to a Live UI
When launching the orchestrator as a subprocess, Python buffers stdout. Force immediate flushing:
# In the orchestrator β every print() call
print(line, flush=True)
# In the UI server β launch with unbuffered flag
env = dict(os.environ, PYTHONUNBUFFERED="1")
subprocess.Popen([sys.executable, "-u", "orchestrator.py"], env=env)
4. Prompt Engineering for Small Models
The most important lesson from this project: a 3B-active-parameter MoE (Qwen3-Coder-30B-A3B) requires fundamentally different prompting than a frontier model.
4.1 Be Prescriptive, Not Descriptive
Don't ask the model to design. Tell it exactly what to produce.
Bad: "Generate sensor data with realistic values for a city infrastructure system"
Good: "voltage: random.uniform(115, 125). NOT 220-240. NOT European voltage."
The model will substitute its training priors for your specifications unless you're explicit β and sometimes even then. Including "NOT X" alongside "use Y" is more reliable than "use Y" alone.
4.2 Provide Skeletons, Not Specs
Don't ask the model to write 120 lines from scratch. Give it 80 lines of skeleton with TODOs to fill in. The model contributes 15-30 lines of logic instead of generating boilerplate that blows the token budget.
Complete this skeleton. Fill in the TODO sections.
Do NOT modify the existing code. Do NOT add Pydantic models.
```python
def generate_sensors():
data = {}
for d in DISTRICTS:
power = {"voltage": round(random.uniform(115,125),1), ...}
# TODO: Override anomaly districts
data[d] = {"power": power, "water": water, "traffic": traffic}
return data
@app.get("/api/health")
def health():
# TODO: compute overall status
pass
### 4.3 Binary Reviews, Not Open-Ended Critique
A 3B-active model cannot reliably do nuanced code review. It will flag correct code as wrong if the review criteria are ambiguous. Reduce the Reviewer to binary checks:
1. Are all three endpoints present?
2. Do they return data (not `pass`)?
3. Is CORS enabled?
4. Does uvicorn bind to the right port?
5. Any syntax errors?
### 4.4 Template Injection for Large Outputs
The model caps output at ~2,000 characters. A full HTML dashboard with Chart.js is 5,000-8,000 characters. Don't fight the limit β work with it.
Have the model generate only the dynamic part (a `generateData()` function), then inject it into a pre-built template:
```python
template = open("dashboard_template.html").read()
gen_func = run_agent("analyst", "Output ONLY a generateData() function...")
dashboard = template.replace("%%GENERATE_DATA%%", gen_func)
4.5 Always Start with "Output ONLY..."
Every prompt to openclaw agent should begin with a forceful instruction:
Output ONLY raw HTML. No explanation. No markdown. Start with <!DOCTYPE html>.
Output ONLY a Python code block. Start with ```python end with ```. No explanation.
Without this, the agent wraps its output in conversational prose, markdown formatting, or tool-call XML.
5. Lessons Learned
| Problem | Root Cause | Solution |
|---|---|---|
| Model ignores explicit values | Training priors override prompt | Include "NOT X" alongside "use Y" |
| Output truncated mid-code | maxTokens: 4096 hardcoded | Patch NemoClaw Dockerfile |
| Output still truncated | Thinking tokens consume budget | --thinking off on every call |
| Output still short (~2K chars) | Model's natural stop behavior | Skeleton prompts, template injection |
| Retries produce worse output | Session context accumulates | Unique session ID per call |
| Tool-call XML in output | OpenClaw agent framework | Regex cleanup post-processing |
| Reviewer rejects valid code | Reviewing against drifted plan | Binary checks, not schema comparison |
| Sandbox creation fails | Policy YAML parsing bug | Create manually via openshell sandbox create |
| Live UI shows no output | Python stdout buffering | flush=True + PYTHONUNBUFFERED=1 |
| Port 8080 in use | NemoClaw gateway occupies it | Use different port for UI server |
6. Project Structure
nemoclaw-demo/
βββ orchestrator.py # Main pipeline
βββ ui_server.py # Live browser UI (port 8888)
βββ prompts/
β βββ architect.txt # Spec β plan
β βββ coder_initial.txt # Plan β code (skeleton)
β βββ coder_retry.txt # Error β fixed code
β βββ reviewer.txt # Code β binary verdict
β βββ analyst.txt # β generateData() function
β βββ dashboard_template.html # Pre-built dashboard UI
βββ fallback/
β βββ generated_app.py # Known-good backend
β βββ dashboard.html # Known-good dashboard
βββ output/ # Pipeline output
βββ logs/ # Per-run logs
7. Running the Demo
# Terminal 1: Start the live UI
python3 ui_server.py
# Open in browser (use the Network URL it prints)
# Click RUN PIPELINE
Or run the pipeline directly:
python3 orchestrator.py
# Open output/dashboard.html in a browser
Security Architecture
Each NemoClaw sandbox enforces:
- Landlock filesystem restrictions β agents see only
/sandboxand/tmp - seccomp system call filtering β blocked syscalls fail silently
- Network namespace isolation β no direct egress, only
inference.local - Inference routing β agents hit
inference.local, the gateway routes to Ollama; the agent never sees host IPs or ports
The orchestrator runs on the host as a trusted operator. Agents never communicate directly β every artifact passes through the host-side orchestrator. This is least-privilege at the agent level.
Last updated: April 2026 NemoClaw v0.1.0 | OpenShell v0.0.16 | OpenClaw 2026.3.11

