curtburk's picture
Fix logo inline layout with table
2ba3790
metadata
title: NemoClaw Developer Guide
emoji: 🦞
colorFrom: blue
colorTo: green
sdk: static
pinned: false

HPNVIDIA

Company logos are used for identification purposes only and do not imply endorsement or official partnership unless otherwise stated.

Agentic AI with NemoClaw β€” Developer Guide

Building Multi-Agent Pipelines on the HP ZGX Nano AI Station

Written and prepared by Curtis Burkhalter, Ph.D. | Technical Product Marketing Manager, AI Solutions at HP


What This Guide Covers

This guide documents how to build a multi-agent AI pipeline using NVIDIA NemoClaw and OpenShell on the HP ZGX Nano AI Station. It covers the real-world setup, the problems you'll hit, and the solutions that actually work β€” not the theory, the practice.

The reference implementation is a four-agent "Autonomous Software Factory" that takes a plain-English specification and produces a working city infrastructure health dashboard. Each agent runs in an isolated NemoClaw sandbox backed by local LLM inference via Ollama.

Pipeline: Architect β†’ Coder β†’ Reviewer β†’ Analyst

Hardware: HP ZGX Nano AI Station (NVIDIA GB10 Grace Blackwell, ARM64, 128GB unified memory)

Models: Qwen3-32B (Architect), Qwen3-Coder-30B-A3B (Coder, Reviewer, Analyst)

Source: github.com/curtburk/nemoclaw-demo


Prerequisites

  • HP ZGX Nano AI Station (or any NVIDIA GPU system with 64GB+ memory)
  • NemoClaw installed (nemoclaw --help responds)
  • OpenShell CLI installed (openshell --help responds)
  • Ollama installed with your models pulled
  • Python 3.10+ on the host

1. NemoClaw Setup

1.1 Onboarding a Sandbox

nemoclaw onboard

The wizard walks you through provider configuration and sandbox creation. Choose "Local Ollama" when prompted. You'll hit a sandbox policy parsing error on the current version (v0.1.0):

Error: failed to parse sandbox policy YAML
  network_policies.name: invalid type: string "github", expected struct

Workaround: The image builds successfully β€” the error is only on sandbox creation. Grab the image tag from the build output and create manually:

openshell sandbox create --name architect --from openshell/sandbox-from:<IMAGE_TAG>

1.2 Creating Multiple Sandboxes

NemoClaw's onboarding creates one sandbox at a time. For a multi-agent pipeline, run nemoclaw onboard once per agent, then use the image reuse trick for agents that share the same model:

# Build image once via onboard (select qwen3-coder:latest)
nemoclaw onboard  # β†’ name it "coder", note the image tag

# Reuse that image for reviewer and analyst
openshell sandbox create --name reviewer --from openshell/sandbox-from:<SAME_TAG>
openshell sandbox create --name analyst --from openshell/sandbox-from:<SAME_TAG>

1.3 Verify Sandbox Connectivity

ssh -o BatchMode=yes openshell-architect "echo OK"
ssh -o BatchMode=yes openshell-coder "echo OK"
ssh -o BatchMode=yes openshell-reviewer "echo OK"
ssh -o BatchMode=yes openshell-analyst "echo OK"

2. Model Configuration

2.1 Using Different Models Per Sandbox

The gateway inference route is global β€” one model for all sandboxes. But each sandbox's openclaw.json specifies which model to request, and Ollama serves whichever model the request asks for. So you can run different models per agent even with a single provider.

To set this up, run nemoclaw onboard separately for the sandbox that needs a different model (e.g., Architect with qwen3:32b), then use qwen3-coder:latest for the rest.

2.2 The maxTokens Problem

Critical: NemoClaw hardcodes maxTokens: 4096 in the Dockerfile that generates openclaw.json. This limits model output to ~3,000 characters β€” too short for code generation.

Fix: Patch the Dockerfile before building sandbox images:

sed -i "s/'maxTokens': 4096/'maxTokens': 16384/" \
  ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile

Verify:

grep "maxTokens" ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile

Then rebuild your sandboxes. The path may differ based on your Node.js version β€” use which nemoclaw to find it.

2.3 Model Output Limits

Even with maxTokens: 16384, Qwen3-Coder-30B-A3B naturally stops generating at ~2,000 characters. This is a model behavior, not a configuration issue. Design your prompts accordingly β€” see Section 4.

2.4 Thinking Tokens

Qwen3-Coder uses internal reasoning tokens by default. These consume output budget without producing visible output. Always pass --thinking off:

openclaw agent --agent main --local --thinking off --session-id my-session -m "your prompt"

3. Orchestrator Architecture

3.1 Sending Prompts to Sandboxes

openclaw agent requires -m <text> β€” it does not read from stdin. Long prompts with special characters break shell escaping. The solution: pipe the prompt into a file inside the sandbox, then use command substitution.

# Step 1: Upload prompt via stdin β†’ file
upload_cmd = ["ssh", ssh_host, "cat > /tmp/prompt.txt"]
subprocess.run(upload_cmd, input=prompt, text=True)

# Step 2: Run agent with file content
agent_cmd = ["ssh", ssh_host,
    'openclaw agent --agent main --local --thinking off '
    '--session-id my-session -m "$(cat /tmp/prompt.txt)"']
subprocess.run(agent_cmd, capture_output=True, text=True)

3.2 Session ID Management

Use unique session IDs per call. OpenClaw accumulates conversation history within a session. If the Coder retries three times with the same session ID, the context window fills with prior failed attempts, squeezing out output tokens.

import uuid
call_session = f"pipeline-{agent_name}-{uuid.uuid4().hex[:8]}"

3.3 Cleaning Model Output

The model frequently emits OpenClaw tool-call artifacts in its output:

</parameter>
<parameter=file_path>
/sandbox/.openclaw/workspace/file.js
</parameter>
</function>
</tool_call>

Strip these before processing:

import re
output = re.sub(r'</parameter>.*', '', output, flags=re.DOTALL)
output = re.sub(r'<parameter[^>]*>.*', '', output, flags=re.DOTALL)
output = re.sub(r'</function>.*', '', output, flags=re.DOTALL)
output = re.sub(r'</tool_call>.*', '', output, flags=re.DOTALL)

3.4 Streaming Output to a Live UI

When launching the orchestrator as a subprocess, Python buffers stdout. Force immediate flushing:

# In the orchestrator β€” every print() call
print(line, flush=True)

# In the UI server β€” launch with unbuffered flag
env = dict(os.environ, PYTHONUNBUFFERED="1")
subprocess.Popen([sys.executable, "-u", "orchestrator.py"], env=env)

4. Prompt Engineering for Small Models

The most important lesson from this project: a 3B-active-parameter MoE (Qwen3-Coder-30B-A3B) requires fundamentally different prompting than a frontier model.

4.1 Be Prescriptive, Not Descriptive

Don't ask the model to design. Tell it exactly what to produce.

Bad: "Generate sensor data with realistic values for a city infrastructure system"

Good: "voltage: random.uniform(115, 125). NOT 220-240. NOT European voltage."

The model will substitute its training priors for your specifications unless you're explicit β€” and sometimes even then. Including "NOT X" alongside "use Y" is more reliable than "use Y" alone.

4.2 Provide Skeletons, Not Specs

Don't ask the model to write 120 lines from scratch. Give it 80 lines of skeleton with TODOs to fill in. The model contributes 15-30 lines of logic instead of generating boilerplate that blows the token budget.

Complete this skeleton. Fill in the TODO sections.
Do NOT modify the existing code. Do NOT add Pydantic models.

```python
def generate_sensors():
    data = {}
    for d in DISTRICTS:
        power = {"voltage": round(random.uniform(115,125),1), ...}
        # TODO: Override anomaly districts
        data[d] = {"power": power, "water": water, "traffic": traffic}
    return data

@app.get("/api/health")
def health():
    # TODO: compute overall status
    pass

### 4.3 Binary Reviews, Not Open-Ended Critique

A 3B-active model cannot reliably do nuanced code review. It will flag correct code as wrong if the review criteria are ambiguous. Reduce the Reviewer to binary checks:

1. Are all three endpoints present?
2. Do they return data (not `pass`)?
3. Is CORS enabled?
4. Does uvicorn bind to the right port?
5. Any syntax errors?

### 4.4 Template Injection for Large Outputs

The model caps output at ~2,000 characters. A full HTML dashboard with Chart.js is 5,000-8,000 characters. Don't fight the limit β€” work with it.

Have the model generate only the dynamic part (a `generateData()` function), then inject it into a pre-built template:

```python
template = open("dashboard_template.html").read()
gen_func = run_agent("analyst", "Output ONLY a generateData() function...")
dashboard = template.replace("%%GENERATE_DATA%%", gen_func)

4.5 Always Start with "Output ONLY..."

Every prompt to openclaw agent should begin with a forceful instruction:

Output ONLY raw HTML. No explanation. No markdown. Start with <!DOCTYPE html>.
Output ONLY a Python code block. Start with ```python end with ```. No explanation.

Without this, the agent wraps its output in conversational prose, markdown formatting, or tool-call XML.


5. Lessons Learned

Problem Root Cause Solution
Model ignores explicit values Training priors override prompt Include "NOT X" alongside "use Y"
Output truncated mid-code maxTokens: 4096 hardcoded Patch NemoClaw Dockerfile
Output still truncated Thinking tokens consume budget --thinking off on every call
Output still short (~2K chars) Model's natural stop behavior Skeleton prompts, template injection
Retries produce worse output Session context accumulates Unique session ID per call
Tool-call XML in output OpenClaw agent framework Regex cleanup post-processing
Reviewer rejects valid code Reviewing against drifted plan Binary checks, not schema comparison
Sandbox creation fails Policy YAML parsing bug Create manually via openshell sandbox create
Live UI shows no output Python stdout buffering flush=True + PYTHONUNBUFFERED=1
Port 8080 in use NemoClaw gateway occupies it Use different port for UI server

6. Project Structure

nemoclaw-demo/
β”œβ”€β”€ orchestrator.py              # Main pipeline
β”œβ”€β”€ ui_server.py                 # Live browser UI (port 8888)
β”œβ”€β”€ prompts/
β”‚   β”œβ”€β”€ architect.txt            # Spec β†’ plan
β”‚   β”œβ”€β”€ coder_initial.txt        # Plan β†’ code (skeleton)
β”‚   β”œβ”€β”€ coder_retry.txt          # Error β†’ fixed code
β”‚   β”œβ”€β”€ reviewer.txt             # Code β†’ binary verdict
β”‚   β”œβ”€β”€ analyst.txt              # β†’ generateData() function
β”‚   └── dashboard_template.html  # Pre-built dashboard UI
β”œβ”€β”€ fallback/
β”‚   β”œβ”€β”€ generated_app.py         # Known-good backend
β”‚   └── dashboard.html           # Known-good dashboard
β”œβ”€β”€ output/                      # Pipeline output
└── logs/                        # Per-run logs

7. Running the Demo

# Terminal 1: Start the live UI
python3 ui_server.py

# Open in browser (use the Network URL it prints)
# Click RUN PIPELINE

Or run the pipeline directly:

python3 orchestrator.py
# Open output/dashboard.html in a browser

Security Architecture

Each NemoClaw sandbox enforces:

  • Landlock filesystem restrictions β€” agents see only /sandbox and /tmp
  • seccomp system call filtering β€” blocked syscalls fail silently
  • Network namespace isolation β€” no direct egress, only inference.local
  • Inference routing β€” agents hit inference.local, the gateway routes to Ollama; the agent never sees host IPs or ports

The orchestrator runs on the host as a trusted operator. Agents never communicate directly β€” every artifact passes through the host-side orchestrator. This is least-privilege at the agent level.


Last updated: April 2026 NemoClaw v0.1.0 | OpenShell v0.0.16 | OpenClaw 2026.3.11