| --- |
| title: NemoClaw Developer Guide |
| emoji: π¦ |
| colorFrom: blue |
| colorTo: green |
| sdk: static |
| pinned: false |
| --- |
| <p align="center"><table align="center"><tr><td align="center"><img src="images/logo_HP_Electric_Blue_keyline.png" alt="HP" width="80"></td><td width="40"></td><td align="center"><img src="images/nvidia-logo-vert.png" alt="NVIDIA" width="80"></td></tr></table></p> |
|
|
| <p align="center"><em>Company logos are used for identification purposes only and do not imply endorsement or official partnership unless otherwise stated.</em></p> |
|
|
| # Agentic AI with NemoClaw β Developer Guide |
|
|
| **Building Multi-Agent Pipelines on the HP ZGX Nano AI Station** |
|
|
| *Written and prepared by Curtis Burkhalter, Ph.D. | Technical Product Marketing Manager, AI Solutions at HP* |
|
|
| --- |
|
|
| ## What This Guide Covers |
|
|
| This guide documents how to build a multi-agent AI pipeline using NVIDIA NemoClaw and OpenShell on the HP ZGX Nano AI Station. It covers the real-world setup, the problems you'll hit, and the solutions that actually work β not the theory, the practice. |
|
|
| The reference implementation is a four-agent "Autonomous Software Factory" that takes a plain-English specification and produces a working city infrastructure health dashboard. Each agent runs in an isolated NemoClaw sandbox backed by local LLM inference via Ollama. |
|
|
| **Pipeline:** Architect β Coder β Reviewer β Analyst |
|
|
| **Hardware:** HP ZGX Nano AI Station (NVIDIA GB10 Grace Blackwell, ARM64, 128GB unified memory) |
|
|
| **Models:** Qwen3-32B (Architect), Qwen3-Coder-30B-A3B (Coder, Reviewer, Analyst) |
|
|
| **Source:** [github.com/curtburk/nemoclaw-demo](https://github.com/curtburk/nemoclaw-demo) |
|
|
| --- |
|
|
| ## Prerequisites |
|
|
| - HP ZGX Nano AI Station (or any NVIDIA GPU system with 64GB+ memory) |
| - NemoClaw installed (`nemoclaw --help` responds) |
| - OpenShell CLI installed (`openshell --help` responds) |
| - Ollama installed with your models pulled |
| - Python 3.10+ on the host |
|
|
| --- |
|
|
| ## 1. NemoClaw Setup |
|
|
| ### 1.1 Onboarding a Sandbox |
|
|
| ```bash |
| nemoclaw onboard |
| ``` |
|
|
| The wizard walks you through provider configuration and sandbox creation. Choose "Local Ollama" when prompted. You'll hit a **sandbox policy parsing error** on the current version (v0.1.0): |
|
|
| ``` |
| Error: failed to parse sandbox policy YAML |
| network_policies.name: invalid type: string "github", expected struct |
| ``` |
|
|
| **Workaround:** The image builds successfully β the error is only on sandbox creation. Grab the image tag from the build output and create manually: |
|
|
| ```bash |
| openshell sandbox create --name architect --from openshell/sandbox-from:<IMAGE_TAG> |
| ``` |
|
|
| ### 1.2 Creating Multiple Sandboxes |
|
|
| NemoClaw's onboarding creates one sandbox at a time. For a multi-agent pipeline, run `nemoclaw onboard` once per agent, then use the image reuse trick for agents that share the same model: |
|
|
| ```bash |
| # Build image once via onboard (select qwen3-coder:latest) |
| nemoclaw onboard # β name it "coder", note the image tag |
| |
| # Reuse that image for reviewer and analyst |
| openshell sandbox create --name reviewer --from openshell/sandbox-from:<SAME_TAG> |
| openshell sandbox create --name analyst --from openshell/sandbox-from:<SAME_TAG> |
| ``` |
|
|
| ### 1.3 Verify Sandbox Connectivity |
|
|
| ```bash |
| ssh -o BatchMode=yes openshell-architect "echo OK" |
| ssh -o BatchMode=yes openshell-coder "echo OK" |
| ssh -o BatchMode=yes openshell-reviewer "echo OK" |
| ssh -o BatchMode=yes openshell-analyst "echo OK" |
| ``` |
|
|
| --- |
|
|
| ## 2. Model Configuration |
|
|
| ### 2.1 Using Different Models Per Sandbox |
|
|
| The gateway inference route is global β one model for all sandboxes. But each sandbox's `openclaw.json` specifies which model to request, and Ollama serves whichever model the request asks for. So you can run different models per agent even with a single provider. |
|
|
| To set this up, run `nemoclaw onboard` separately for the sandbox that needs a different model (e.g., Architect with `qwen3:32b`), then use `qwen3-coder:latest` for the rest. |
|
|
| ### 2.2 The maxTokens Problem |
|
|
| **Critical:** NemoClaw hardcodes `maxTokens: 4096` in the Dockerfile that generates `openclaw.json`. This limits model output to ~3,000 characters β too short for code generation. |
|
|
| **Fix:** Patch the Dockerfile before building sandbox images: |
|
|
| ```bash |
| sed -i "s/'maxTokens': 4096/'maxTokens': 16384/" \ |
| ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile |
| ``` |
|
|
| Verify: |
|
|
| ```bash |
| grep "maxTokens" ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile |
| ``` |
|
|
| Then rebuild your sandboxes. The path may differ based on your Node.js version β use `which nemoclaw` to find it. |
|
|
| ### 2.3 Model Output Limits |
|
|
| Even with `maxTokens: 16384`, Qwen3-Coder-30B-A3B naturally stops generating at ~2,000 characters. This is a model behavior, not a configuration issue. Design your prompts accordingly β see Section 4. |
|
|
| ### 2.4 Thinking Tokens |
|
|
| Qwen3-Coder uses internal reasoning tokens by default. These consume output budget without producing visible output. Always pass `--thinking off`: |
|
|
| ```bash |
| openclaw agent --agent main --local --thinking off --session-id my-session -m "your prompt" |
| ``` |
|
|
| --- |
|
|
| ## 3. Orchestrator Architecture |
|
|
| ### 3.1 Sending Prompts to Sandboxes |
|
|
| `openclaw agent` requires `-m <text>` β it does not read from stdin. Long prompts with special characters break shell escaping. The solution: pipe the prompt into a file inside the sandbox, then use command substitution. |
|
|
| ```python |
| # Step 1: Upload prompt via stdin β file |
| upload_cmd = ["ssh", ssh_host, "cat > /tmp/prompt.txt"] |
| subprocess.run(upload_cmd, input=prompt, text=True) |
| |
| # Step 2: Run agent with file content |
| agent_cmd = ["ssh", ssh_host, |
| 'openclaw agent --agent main --local --thinking off ' |
| '--session-id my-session -m "$(cat /tmp/prompt.txt)"'] |
| subprocess.run(agent_cmd, capture_output=True, text=True) |
| ``` |
|
|
| ### 3.2 Session ID Management |
|
|
| **Use unique session IDs per call.** OpenClaw accumulates conversation history within a session. If the Coder retries three times with the same session ID, the context window fills with prior failed attempts, squeezing out output tokens. |
|
|
| ```python |
| import uuid |
| call_session = f"pipeline-{agent_name}-{uuid.uuid4().hex[:8]}" |
| ``` |
|
|
| ### 3.3 Cleaning Model Output |
|
|
| The model frequently emits OpenClaw tool-call artifacts in its output: |
|
|
| ``` |
| </parameter> |
| <parameter=file_path> |
| /sandbox/.openclaw/workspace/file.js |
| </parameter> |
| </function> |
| </tool_call> |
| ``` |
|
|
| Strip these before processing: |
|
|
| ```python |
| import re |
| output = re.sub(r'</parameter>.*', '', output, flags=re.DOTALL) |
| output = re.sub(r'<parameter[^>]*>.*', '', output, flags=re.DOTALL) |
| output = re.sub(r'</function>.*', '', output, flags=re.DOTALL) |
| output = re.sub(r'</tool_call>.*', '', output, flags=re.DOTALL) |
| ``` |
|
|
| ### 3.4 Streaming Output to a Live UI |
|
|
| When launching the orchestrator as a subprocess, Python buffers stdout. Force immediate flushing: |
|
|
| ```python |
| # In the orchestrator β every print() call |
| print(line, flush=True) |
| |
| # In the UI server β launch with unbuffered flag |
| env = dict(os.environ, PYTHONUNBUFFERED="1") |
| subprocess.Popen([sys.executable, "-u", "orchestrator.py"], env=env) |
| ``` |
|
|
| --- |
|
|
| ## 4. Prompt Engineering for Small Models |
|
|
| The most important lesson from this project: **a 3B-active-parameter MoE (Qwen3-Coder-30B-A3B) requires fundamentally different prompting than a frontier model.** |
|
|
| ### 4.1 Be Prescriptive, Not Descriptive |
|
|
| Don't ask the model to design. Tell it exactly what to produce. |
|
|
| **Bad:** "Generate sensor data with realistic values for a city infrastructure system" |
|
|
| **Good:** "voltage: random.uniform(115, 125). NOT 220-240. NOT European voltage." |
|
|
| The model will substitute its training priors for your specifications unless you're explicit β and sometimes even then. Including "NOT X" alongside "use Y" is more reliable than "use Y" alone. |
|
|
| ### 4.2 Provide Skeletons, Not Specs |
|
|
| Don't ask the model to write 120 lines from scratch. Give it 80 lines of skeleton with TODOs to fill in. The model contributes 15-30 lines of logic instead of generating boilerplate that blows the token budget. |
|
|
| ``` |
| Complete this skeleton. Fill in the TODO sections. |
| Do NOT modify the existing code. Do NOT add Pydantic models. |
| |
| ```python |
| def generate_sensors(): |
| data = {} |
| for d in DISTRICTS: |
| power = {"voltage": round(random.uniform(115,125),1), ...} |
| # TODO: Override anomaly districts |
| data[d] = {"power": power, "water": water, "traffic": traffic} |
| return data |
| |
| @app.get("/api/health") |
| def health(): |
| # TODO: compute overall status |
| pass |
| ``` |
| ``` |
| |
| ### 4.3 Binary Reviews, Not Open-Ended Critique |
| |
| A 3B-active model cannot reliably do nuanced code review. It will flag correct code as wrong if the review criteria are ambiguous. Reduce the Reviewer to binary checks: |
| |
| 1. Are all three endpoints present? |
| 2. Do they return data (not `pass`)? |
| 3. Is CORS enabled? |
| 4. Does uvicorn bind to the right port? |
| 5. Any syntax errors? |
| |
| ### 4.4 Template Injection for Large Outputs |
| |
| The model caps output at ~2,000 characters. A full HTML dashboard with Chart.js is 5,000-8,000 characters. Don't fight the limit β work with it. |
| |
| Have the model generate only the dynamic part (a `generateData()` function), then inject it into a pre-built template: |
| |
| ```python |
| template = open("dashboard_template.html").read() |
| gen_func = run_agent("analyst", "Output ONLY a generateData() function...") |
| dashboard = template.replace("%%GENERATE_DATA%%", gen_func) |
| ``` |
| |
| ### 4.5 Always Start with "Output ONLY..." |
| |
| Every prompt to `openclaw agent` should begin with a forceful instruction: |
| |
| ``` |
| Output ONLY raw HTML. No explanation. No markdown. Start with <!DOCTYPE html>. |
| ``` |
| |
| ``` |
| Output ONLY a Python code block. Start with ```python end with ```. No explanation. |
| ``` |
| |
| Without this, the agent wraps its output in conversational prose, markdown formatting, or tool-call XML. |
| |
| --- |
| |
| ## 5. Lessons Learned |
| |
| | Problem | Root Cause | Solution | |
| |---------|-----------|----------| |
| | Model ignores explicit values | Training priors override prompt | Include "NOT X" alongside "use Y" | |
| | Output truncated mid-code | maxTokens: 4096 hardcoded | Patch NemoClaw Dockerfile | |
| | Output still truncated | Thinking tokens consume budget | `--thinking off` on every call | |
| | Output still short (~2K chars) | Model's natural stop behavior | Skeleton prompts, template injection | |
| | Retries produce worse output | Session context accumulates | Unique session ID per call | |
| | Tool-call XML in output | OpenClaw agent framework | Regex cleanup post-processing | |
| | Reviewer rejects valid code | Reviewing against drifted plan | Binary checks, not schema comparison | |
| | Sandbox creation fails | Policy YAML parsing bug | Create manually via `openshell sandbox create` | |
| | Live UI shows no output | Python stdout buffering | `flush=True` + `PYTHONUNBUFFERED=1` | |
| | Port 8080 in use | NemoClaw gateway occupies it | Use different port for UI server | |
| |
| --- |
| |
| ## 6. Project Structure |
| |
| ``` |
| nemoclaw-demo/ |
| βββ orchestrator.py # Main pipeline |
| βββ ui_server.py # Live browser UI (port 8888) |
| βββ prompts/ |
| β βββ architect.txt # Spec β plan |
| β βββ coder_initial.txt # Plan β code (skeleton) |
| β βββ coder_retry.txt # Error β fixed code |
| β βββ reviewer.txt # Code β binary verdict |
| β βββ analyst.txt # β generateData() function |
| β βββ dashboard_template.html # Pre-built dashboard UI |
| βββ fallback/ |
| β βββ generated_app.py # Known-good backend |
| β βββ dashboard.html # Known-good dashboard |
| βββ output/ # Pipeline output |
| βββ logs/ # Per-run logs |
| ``` |
| |
| --- |
| |
| ## 7. Running the Demo |
| |
| ```bash |
| # Terminal 1: Start the live UI |
| python3 ui_server.py |
|
|
| # Open in browser (use the Network URL it prints) |
| # Click RUN PIPELINE |
| ``` |
| |
| Or run the pipeline directly: |
| |
| ```bash |
| python3 orchestrator.py |
| # Open output/dashboard.html in a browser |
| ``` |
| |
| --- |
| |
| ## Security Architecture |
| |
| Each NemoClaw sandbox enforces: |
| |
| - **Landlock** filesystem restrictions β agents see only `/sandbox` and `/tmp` |
| - **seccomp** system call filtering β blocked syscalls fail silently |
| - **Network namespace isolation** β no direct egress, only `inference.local` |
| - **Inference routing** β agents hit `inference.local`, the gateway routes to Ollama; the agent never sees host IPs or ports |
| |
| The orchestrator runs on the host as a trusted operator. Agents never communicate directly β every artifact passes through the host-side orchestrator. This is least-privilege at the agent level. |
| |
| --- |
| |
| *Last updated: April 2026* |
| *NemoClaw v0.1.0 | OpenShell v0.0.16 | OpenClaw 2026.3.11* |