--- title: NemoClaw Developer Guide emoji: 🦞 colorFrom: blue colorTo: green sdk: static pinned: false ---

HPNVIDIA

Company logos are used for identification purposes only and do not imply endorsement or official partnership unless otherwise stated.

# Agentic AI with NemoClaw — Developer Guide **Building Multi-Agent Pipelines on the HP ZGX Nano AI Station** *Written and prepared by Curtis Burkhalter, Ph.D. | Technical Product Marketing Manager, AI Solutions at HP* --- ## What This Guide Covers This guide documents how to build a multi-agent AI pipeline using NVIDIA NemoClaw and OpenShell on the HP ZGX Nano AI Station. It covers the real-world setup, the problems you'll hit, and the solutions that actually work — not the theory, the practice. The reference implementation is a four-agent "Autonomous Software Factory" that takes a plain-English specification and produces a working city infrastructure health dashboard. Each agent runs in an isolated NemoClaw sandbox backed by local LLM inference via Ollama. **Pipeline:** Architect → Coder → Reviewer → Analyst **Hardware:** HP ZGX Nano AI Station (NVIDIA GB10 Grace Blackwell, ARM64, 128GB unified memory) **Models:** Qwen3-32B (Architect), Qwen3-Coder-30B-A3B (Coder, Reviewer, Analyst) **Source:** [github.com/curtburk/nemoclaw-demo](https://github.com/curtburk/nemoclaw-demo) --- ## Prerequisites - HP ZGX Nano AI Station (or any NVIDIA GPU system with 64GB+ memory) - NemoClaw installed (`nemoclaw --help` responds) - OpenShell CLI installed (`openshell --help` responds) - Ollama installed with your models pulled - Python 3.10+ on the host --- ## 1. NemoClaw Setup ### 1.1 Onboarding a Sandbox ```bash nemoclaw onboard ``` The wizard walks you through provider configuration and sandbox creation. Choose "Local Ollama" when prompted. You'll hit a **sandbox policy parsing error** on the current version (v0.1.0): ``` Error: failed to parse sandbox policy YAML network_policies.name: invalid type: string "github", expected struct ``` **Workaround:** The image builds successfully — the error is only on sandbox creation. Grab the image tag from the build output and create manually: ```bash openshell sandbox create --name architect --from openshell/sandbox-from: ``` ### 1.2 Creating Multiple Sandboxes NemoClaw's onboarding creates one sandbox at a time. For a multi-agent pipeline, run `nemoclaw onboard` once per agent, then use the image reuse trick for agents that share the same model: ```bash # Build image once via onboard (select qwen3-coder:latest) nemoclaw onboard # → name it "coder", note the image tag # Reuse that image for reviewer and analyst openshell sandbox create --name reviewer --from openshell/sandbox-from: openshell sandbox create --name analyst --from openshell/sandbox-from: ``` ### 1.3 Verify Sandbox Connectivity ```bash ssh -o BatchMode=yes openshell-architect "echo OK" ssh -o BatchMode=yes openshell-coder "echo OK" ssh -o BatchMode=yes openshell-reviewer "echo OK" ssh -o BatchMode=yes openshell-analyst "echo OK" ``` --- ## 2. Model Configuration ### 2.1 Using Different Models Per Sandbox The gateway inference route is global — one model for all sandboxes. But each sandbox's `openclaw.json` specifies which model to request, and Ollama serves whichever model the request asks for. So you can run different models per agent even with a single provider. To set this up, run `nemoclaw onboard` separately for the sandbox that needs a different model (e.g., Architect with `qwen3:32b`), then use `qwen3-coder:latest` for the rest. ### 2.2 The maxTokens Problem **Critical:** NemoClaw hardcodes `maxTokens: 4096` in the Dockerfile that generates `openclaw.json`. This limits model output to ~3,000 characters — too short for code generation. **Fix:** Patch the Dockerfile before building sandbox images: ```bash sed -i "s/'maxTokens': 4096/'maxTokens': 16384/" \ ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile ``` Verify: ```bash grep "maxTokens" ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile ``` Then rebuild your sandboxes. The path may differ based on your Node.js version — use `which nemoclaw` to find it. ### 2.3 Model Output Limits Even with `maxTokens: 16384`, Qwen3-Coder-30B-A3B naturally stops generating at ~2,000 characters. This is a model behavior, not a configuration issue. Design your prompts accordingly — see Section 4. ### 2.4 Thinking Tokens Qwen3-Coder uses internal reasoning tokens by default. These consume output budget without producing visible output. Always pass `--thinking off`: ```bash openclaw agent --agent main --local --thinking off --session-id my-session -m "your prompt" ``` --- ## 3. Orchestrator Architecture ### 3.1 Sending Prompts to Sandboxes `openclaw agent` requires `-m ` — it does not read from stdin. Long prompts with special characters break shell escaping. The solution: pipe the prompt into a file inside the sandbox, then use command substitution. ```python # Step 1: Upload prompt via stdin → file upload_cmd = ["ssh", ssh_host, "cat > /tmp/prompt.txt"] subprocess.run(upload_cmd, input=prompt, text=True) # Step 2: Run agent with file content agent_cmd = ["ssh", ssh_host, 'openclaw agent --agent main --local --thinking off ' '--session-id my-session -m "$(cat /tmp/prompt.txt)"'] subprocess.run(agent_cmd, capture_output=True, text=True) ``` ### 3.2 Session ID Management **Use unique session IDs per call.** OpenClaw accumulates conversation history within a session. If the Coder retries three times with the same session ID, the context window fills with prior failed attempts, squeezing out output tokens. ```python import uuid call_session = f"pipeline-{agent_name}-{uuid.uuid4().hex[:8]}" ``` ### 3.3 Cleaning Model Output The model frequently emits OpenClaw tool-call artifacts in its output: ``` /sandbox/.openclaw/workspace/file.js ``` Strip these before processing: ```python import re output = re.sub(r'.*', '', output, flags=re.DOTALL) output = re.sub(r']*>.*', '', output, flags=re.DOTALL) output = re.sub(r'.*', '', output, flags=re.DOTALL) output = re.sub(r'.*', '', output, flags=re.DOTALL) ``` ### 3.4 Streaming Output to a Live UI When launching the orchestrator as a subprocess, Python buffers stdout. Force immediate flushing: ```python # In the orchestrator — every print() call print(line, flush=True) # In the UI server — launch with unbuffered flag env = dict(os.environ, PYTHONUNBUFFERED="1") subprocess.Popen([sys.executable, "-u", "orchestrator.py"], env=env) ``` --- ## 4. Prompt Engineering for Small Models The most important lesson from this project: **a 3B-active-parameter MoE (Qwen3-Coder-30B-A3B) requires fundamentally different prompting than a frontier model.** ### 4.1 Be Prescriptive, Not Descriptive Don't ask the model to design. Tell it exactly what to produce. **Bad:** "Generate sensor data with realistic values for a city infrastructure system" **Good:** "voltage: random.uniform(115, 125). NOT 220-240. NOT European voltage." The model will substitute its training priors for your specifications unless you're explicit — and sometimes even then. Including "NOT X" alongside "use Y" is more reliable than "use Y" alone. ### 4.2 Provide Skeletons, Not Specs Don't ask the model to write 120 lines from scratch. Give it 80 lines of skeleton with TODOs to fill in. The model contributes 15-30 lines of logic instead of generating boilerplate that blows the token budget. ``` Complete this skeleton. Fill in the TODO sections. Do NOT modify the existing code. Do NOT add Pydantic models. ```python def generate_sensors(): data = {} for d in DISTRICTS: power = {"voltage": round(random.uniform(115,125),1), ...} # TODO: Override anomaly districts data[d] = {"power": power, "water": water, "traffic": traffic} return data @app.get("/api/health") def health(): # TODO: compute overall status pass ``` ``` ### 4.3 Binary Reviews, Not Open-Ended Critique A 3B-active model cannot reliably do nuanced code review. It will flag correct code as wrong if the review criteria are ambiguous. Reduce the Reviewer to binary checks: 1. Are all three endpoints present? 2. Do they return data (not `pass`)? 3. Is CORS enabled? 4. Does uvicorn bind to the right port? 5. Any syntax errors? ### 4.4 Template Injection for Large Outputs The model caps output at ~2,000 characters. A full HTML dashboard with Chart.js is 5,000-8,000 characters. Don't fight the limit — work with it. Have the model generate only the dynamic part (a `generateData()` function), then inject it into a pre-built template: ```python template = open("dashboard_template.html").read() gen_func = run_agent("analyst", "Output ONLY a generateData() function...") dashboard = template.replace("%%GENERATE_DATA%%", gen_func) ``` ### 4.5 Always Start with "Output ONLY..." Every prompt to `openclaw agent` should begin with a forceful instruction: ``` Output ONLY raw HTML. No explanation. No markdown. Start with . ``` ``` Output ONLY a Python code block. Start with ```python end with ```. No explanation. ``` Without this, the agent wraps its output in conversational prose, markdown formatting, or tool-call XML. --- ## 5. Lessons Learned | Problem | Root Cause | Solution | |---------|-----------|----------| | Model ignores explicit values | Training priors override prompt | Include "NOT X" alongside "use Y" | | Output truncated mid-code | maxTokens: 4096 hardcoded | Patch NemoClaw Dockerfile | | Output still truncated | Thinking tokens consume budget | `--thinking off` on every call | | Output still short (~2K chars) | Model's natural stop behavior | Skeleton prompts, template injection | | Retries produce worse output | Session context accumulates | Unique session ID per call | | Tool-call XML in output | OpenClaw agent framework | Regex cleanup post-processing | | Reviewer rejects valid code | Reviewing against drifted plan | Binary checks, not schema comparison | | Sandbox creation fails | Policy YAML parsing bug | Create manually via `openshell sandbox create` | | Live UI shows no output | Python stdout buffering | `flush=True` + `PYTHONUNBUFFERED=1` | | Port 8080 in use | NemoClaw gateway occupies it | Use different port for UI server | --- ## 6. Project Structure ``` nemoclaw-demo/ ├── orchestrator.py # Main pipeline ├── ui_server.py # Live browser UI (port 8888) ├── prompts/ │ ├── architect.txt # Spec → plan │ ├── coder_initial.txt # Plan → code (skeleton) │ ├── coder_retry.txt # Error → fixed code │ ├── reviewer.txt # Code → binary verdict │ ├── analyst.txt # → generateData() function │ └── dashboard_template.html # Pre-built dashboard UI ├── fallback/ │ ├── generated_app.py # Known-good backend │ └── dashboard.html # Known-good dashboard ├── output/ # Pipeline output └── logs/ # Per-run logs ``` --- ## 7. Running the Demo ```bash # Terminal 1: Start the live UI python3 ui_server.py # Open in browser (use the Network URL it prints) # Click RUN PIPELINE ``` Or run the pipeline directly: ```bash python3 orchestrator.py # Open output/dashboard.html in a browser ``` --- ## Security Architecture Each NemoClaw sandbox enforces: - **Landlock** filesystem restrictions — agents see only `/sandbox` and `/tmp` - **seccomp** system call filtering — blocked syscalls fail silently - **Network namespace isolation** — no direct egress, only `inference.local` - **Inference routing** — agents hit `inference.local`, the gateway routes to Ollama; the agent never sees host IPs or ports The orchestrator runs on the host as a trusted operator. Agents never communicate directly — every artifact passes through the host-side orchestrator. This is least-privilege at the agent level. --- *Last updated: April 2026* *NemoClaw v0.1.0 | OpenShell v0.0.16 | OpenClaw 2026.3.11*