| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <title>NemoClaw Developer Guide</title> |
| <link rel="preconnect" href="https://fonts.googleapis.com"> |
| <link href="https://fonts.googleapis.com/css2?family=IBM+Plex+Sans:wght@300;400;500;600;700&family=IBM+Plex+Mono:wght@400;500&display=swap" rel="stylesheet"> |
| <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/styles/github-dark.min.css"> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/highlight.min.js"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/bash.min.js"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/python.min.js"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/highlight.js/11.9.0/languages/yaml.min.js"></script> |
| <script src="https://cdnjs.cloudflare.com/ajax/libs/marked/12.0.0/marked.min.js"></script> |
| <style> |
| :root { |
| --bg: #0f1117; |
| --bg-card: #161b22; |
| --bg-code: #1c2129; |
| --border: #30363d; |
| --text: #e6edf3; |
| --text-secondary: #8b949e; |
| --text-muted: #6e7681; |
| --accent: #58a6ff; |
| --accent-green: #3fb950; |
| --accent-orange: #d29922; |
| --accent-red: #f85149; |
| --accent-purple: #bc8cff; |
| } |
| * { margin: 0; padding: 0; box-sizing: border-box; } |
| body { |
| font-family: 'IBM Plex Sans', -apple-system, sans-serif; |
| background: var(--bg); |
| color: var(--text); |
| line-height: 1.7; |
| -webkit-font-smoothing: antialiased; |
| } |
| .logo-bar { |
| display: flex; |
| align-items: center; |
| justify-content: center; |
| gap: 40px; |
| padding: 32px 24px 16px; |
| flex-wrap: wrap; |
| } |
| .logo-bar img { |
| height: 50px; |
| object-fit: contain; |
| filter: brightness(0.95); |
| transition: filter 0.2s; |
| } |
| .logo-bar img:hover { |
| filter: brightness(1.1); |
| } |
| .logo-disclaimer { |
| text-align: center; |
| font-size: 11px; |
| color: var(--text-muted); |
| padding: 0 24px 24px; |
| font-style: italic; |
| } |
| .container { |
| max-width: 860px; |
| margin: 0 auto; |
| padding: 0 24px 80px; |
| } |
| .content h1 { |
| font-size: 32px; |
| font-weight: 700; |
| margin: 0 0 8px; |
| padding-bottom: 12px; |
| border-bottom: 1px solid var(--border); |
| color: var(--text); |
| } |
| .content h2 { |
| font-size: 22px; |
| font-weight: 600; |
| margin: 48px 0 16px; |
| padding-bottom: 8px; |
| border-bottom: 1px solid var(--border); |
| color: var(--text); |
| } |
| .content h3 { |
| font-size: 17px; |
| font-weight: 600; |
| margin: 32px 0 12px; |
| color: var(--text); |
| } |
| .content p { |
| margin: 0 0 16px; |
| color: var(--text-secondary); |
| font-size: 15px; |
| } |
| .content a { |
| color: var(--accent); |
| text-decoration: none; |
| } |
| .content a:hover { |
| text-decoration: underline; |
| } |
| .content strong { |
| color: var(--text); |
| font-weight: 600; |
| } |
| .content em { |
| color: var(--text-secondary); |
| } |
| .content ul, .content ol { |
| margin: 0 0 16px 24px; |
| color: var(--text-secondary); |
| font-size: 15px; |
| } |
| .content li { |
| margin-bottom: 6px; |
| } |
| .content hr { |
| border: none; |
| border-top: 1px solid var(--border); |
| margin: 40px 0; |
| } |
| .content pre { |
| background: var(--bg-code); |
| border: 1px solid var(--border); |
| border-radius: 8px; |
| padding: 16px; |
| margin: 0 0 16px; |
| overflow-x: auto; |
| font-size: 13px; |
| line-height: 1.6; |
| } |
| .content pre code { |
| font-family: 'IBM Plex Mono', monospace; |
| background: none; |
| padding: 0; |
| border: none; |
| font-size: 13px; |
| color: var(--text); |
| } |
| .content code { |
| font-family: 'IBM Plex Mono', monospace; |
| background: var(--bg-code); |
| border: 1px solid var(--border); |
| border-radius: 4px; |
| padding: 2px 6px; |
| font-size: 13px; |
| color: var(--accent); |
| } |
| .content blockquote { |
| border-left: 3px solid var(--accent); |
| margin: 0 0 16px; |
| padding: 8px 16px; |
| color: var(--text-secondary); |
| background: rgba(88, 166, 255, 0.05); |
| border-radius: 0 6px 6px 0; |
| } |
| .content table { |
| width: 100%; |
| border-collapse: collapse; |
| margin: 0 0 16px; |
| font-size: 14px; |
| } |
| .content th { |
| background: var(--bg-code); |
| border: 1px solid var(--border); |
| padding: 10px 14px; |
| text-align: left; |
| font-weight: 600; |
| color: var(--text); |
| } |
| .content td { |
| border: 1px solid var(--border); |
| padding: 10px 14px; |
| color: var(--text-secondary); |
| } |
| .content p strong:first-child { |
| color: var(--accent-orange); |
| } |
| .footer { |
| text-align: center; |
| padding: 40px 24px; |
| border-top: 1px solid var(--border); |
| margin-top: 60px; |
| font-size: 13px; |
| color: var(--text-muted); |
| } |
| .footer a { |
| color: var(--accent); |
| text-decoration: none; |
| } |
| @media (max-width: 640px) { |
| .logo-bar { gap: 24px; } |
| .logo-bar img { height: 36px; } |
| .content h1 { font-size: 24px; } |
| .content h2 { font-size: 18px; } |
| } |
| </style> |
| </head> |
| <body> |
|
|
| <div class="logo-bar"> |
| <img src="images/Z by HP NVIDIA WHT-01.png" alt="NVIDIA"> |
| </div> |
| <div class="logo-disclaimer"> |
| Company logos are used for identification purposes only and do not imply endorsement or official partnership unless otherwise stated. |
| </div> |
|
|
| <div class="container"> |
| <div class="content" id="content"></div> |
| </div> |
|
|
| <script> |
| const markdown = `# Agentic AI with NemoClaw β Developer Guide |
| |
| **Building Multi-Agent Pipelines on the HP ZGX Nano AI Station** |
| |
| *Written and prepared by Curtis Burkhalter, Ph.D. | Technical Product Marketing Manager, AI Solutions at HP* |
| |
| *To connect with Curtis Burkhalter, Ph.D., find him at https://www.linkedin.com/in/curtburk/* |
| |
| --- |
| |
| ## What This Guide Covers |
| |
| This unofficial developer guide documents how to build a multi-agent AI pipeline using NVIDIA NemoClaw and OpenShell on the HP ZGX Nano AI Station. It covers the real-world setup, the problems you could encounter, and the solutions that actually work - not the theory, the practice. |
| |
| The reference implementation is a four-agent "Autonomous Software Factory" that takes a plain-English specification and produces a working city infrastructure health dashboard. Each agent runs in an isolated NemoClaw sandbox backed by local LLM inference via Ollama. |
| |
| **Pipeline:** Architect β Coder β Reviewer β Analyst |
| |
| **Hardware:** HP ZGX Nano AI Station (NVIDIA GB10 Grace Blackwell, ARM64, 128GB unified memory) |
| |
| **Models:** Qwen3-32B (Architect), Qwen3-Coder-30B-A3B (Coder, Reviewer, Analyst) |
| |
| **Source:** [https://github.com/curtburk/NemoClaw_Autonomous_AI_Factory](https://github.com/curtburk/NemoClaw_Autonomous_AI_Factory) |
| |
| --- |
| |
| ## Prerequisites |
| |
| - HP ZGX Nano AI Station (or any NVIDIA GPU system with 64GB+ memory) |
| - NemoClaw installed (\`nemoclaw --help\` responds) |
| - OpenShell CLI installed (\`openshell --help\` responds) |
| - Ollama installed with your models pulled |
| - Python 3.10+ on the host |
| |
| --- |
| |
| ## 1. NemoClaw Setup |
| |
| ### 1.1 Onboarding a Sandbox |
| |
| \`\`\`bash |
| nemoclaw onboard |
| \`\`\` |
| |
| The wizard walks you through provider configuration and sandbox creation. Choose "Local Ollama" when prompted. You'll hit a **sandbox policy parsing error** on the current version (v0.1.0): |
| |
| \`\`\` |
| Error: failed to parse sandbox policy YAML |
| network_policies.name: invalid type: string "github", expected struct |
| \`\`\` |
| |
| **Workaround:** The image builds successfully β the error is only on sandbox creation. Grab the image tag from the build output and create manually: |
| |
| \`\`\`bash |
| openshell sandbox create --name architect --from openshell/sandbox-from:<IMAGE_TAG> |
| \`\`\` |
| |
| ### 1.2 Creating Multiple Sandboxes |
| |
| NemoClaw's onboarding creates one sandbox at a time. For a multi-agent pipeline, run \`nemoclaw onboard\` once per agent, then use the image reuse trick for agents that share the same model: |
| |
| \`\`\`bash |
| # Build image once via onboard (select qwen3-coder:latest) |
| nemoclaw onboard # β name it "coder", note the image tag |
| |
| # Reuse that image for reviewer and analyst |
| openshell sandbox create --name reviewer --from openshell/sandbox-from:<SAME_TAG> |
| openshell sandbox create --name analyst --from openshell/sandbox-from:<SAME_TAG> |
| \`\`\` |
| |
| ### 1.3 Verify Sandbox Connectivity |
| |
| \`\`\`bash |
| ssh -o BatchMode=yes openshell-architect "echo OK" |
| ssh -o BatchMode=yes openshell-coder "echo OK" |
| ssh -o BatchMode=yes openshell-reviewer "echo OK" |
| ssh -o BatchMode=yes openshell-analyst "echo OK" |
| \`\`\` |
| |
| --- |
| |
| ## 2. Model Configuration |
| |
| ### 2.1 Using Different Models Per Sandbox |
| |
| The gateway inference route is global β one model for all sandboxes. But each sandbox's \`openclaw.json\` specifies which model to request, and Ollama serves whichever model the request asks for. So you can run different models per agent even with a single provider. |
| |
| To set this up, run \`nemoclaw onboard\` separately for the sandbox that needs a different model (e.g., Architect with \`qwen3:32b\`), then use \`qwen3-coder:latest\` for the rest. |
| |
| ### 2.2 The maxTokens Problem |
| |
| **Critical:** NemoClaw hardcodes \`maxTokens: 4096\` in the Dockerfile that generates \`openclaw.json\`. This limits model output to ~3,000 characters β too short for code generation. |
| |
| **Fix:** Patch the Dockerfile before building sandbox images: |
| |
| \`\`\`bash |
| sed -i "s/'maxTokens': 4096/'maxTokens': 16384/" \\ |
| ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile |
| \`\`\` |
| |
| Verify: |
| |
| \`\`\`bash |
| grep "maxTokens" ~/.nvm/versions/node/v22.22.2/lib/node_modules/nemoclaw/Dockerfile |
| \`\`\` |
| |
| Then rebuild your sandboxes. The path may differ based on your Node.js version β use \`which nemoclaw\` to find it. |
| |
| ### 2.3 Model Output Limits |
| |
| Even with \`maxTokens: 16384\`, Qwen3-Coder-30B-A3B naturally stops generating at ~2,000 characters. This is a model behavior, not a configuration issue. Design your prompts accordingly β see Section 4. |
| |
| ### 2.4 Thinking Tokens |
| |
| Qwen3-Coder uses internal reasoning tokens by default. These consume output budget without producing visible output. Always pass \`--thinking off\`: |
| |
| \`\`\`bash |
| openclaw agent --agent main --local --thinking off --session-id my-session -m "your prompt" |
| \`\`\` |
| |
| --- |
| |
| ## 3. Orchestrator Architecture |
| |
| ### 3.1 Sending Prompts to Sandboxes |
| |
| \`openclaw agent\` requires \`-m <text>\` β it does not read from stdin. Long prompts with special characters break shell escaping. The solution: pipe the prompt into a file inside the sandbox, then use command substitution. |
| |
| \`\`\`python |
| # Step 1: Upload prompt via stdin β file |
| upload_cmd = ["ssh", ssh_host, "cat > /tmp/prompt.txt"] |
| subprocess.run(upload_cmd, input=prompt, text=True) |
| |
| # Step 2: Run agent with file content |
| agent_cmd = ["ssh", ssh_host, |
| 'openclaw agent --agent main --local --thinking off ' |
| '--session-id my-session -m "$(cat /tmp/prompt.txt)"'] |
| subprocess.run(agent_cmd, capture_output=True, text=True) |
| \`\`\` |
| |
| ### 3.2 Session ID Management |
| |
| **Use unique session IDs per call.** OpenClaw accumulates conversation history within a session. If the Coder retries three times with the same session ID, the context window fills with prior failed attempts, squeezing out output tokens. |
| |
| \`\`\`python |
| import uuid |
| call_session = f"pipeline-{agent_name}-{uuid.uuid4().hex[:8]}" |
| \`\`\` |
| |
| ### 3.3 Cleaning Model Output |
| |
| The model frequently emits OpenClaw tool-call artifacts in its output: |
| |
| \`\`\` |
| </parameter> |
| <parameter=file_path> |
| /sandbox/.openclaw/workspace/file.js |
| </parameter> |
| </function> |
| </tool_call> |
| \`\`\` |
| |
| Strip these before processing: |
| |
| \`\`\`python |
| import re |
| output = re.sub(r'</parameter>.*', '', output, flags=re.DOTALL) |
| output = re.sub(r'<parameter[^>]*>.*', '', output, flags=re.DOTALL) |
| output = re.sub(r'</function>.*', '', output, flags=re.DOTALL) |
| output = re.sub(r'</tool_call>.*', '', output, flags=re.DOTALL) |
| \`\`\` |
| |
| ### 3.4 Streaming Output to a Live UI |
| |
| When launching the orchestrator as a subprocess, Python buffers stdout. Force immediate flushing: |
| |
| \`\`\`python |
| # In the orchestrator β every print() call |
| print(line, flush=True) |
| |
| # In the UI server β launch with unbuffered flag |
| env = dict(os.environ, PYTHONUNBUFFERED="1") |
| subprocess.Popen([sys.executable, "-u", "orchestrator.py"], env=env) |
| \`\`\` |
| |
| --- |
| |
| ## 4. Prompt Engineering for Small Models |
| |
| The most important lesson from this project: **a 3B-active-parameter MoE (Qwen3-Coder-30B-A3B) requires fundamentally different prompting than a frontier model.** |
| |
| ### 4.1 Be Prescriptive, Not Descriptive |
| |
| Don't ask the model to design. Tell it exactly what to produce. |
| |
| **Bad:** "Generate sensor data with realistic values for a city infrastructure system" |
| |
| **Good:** "voltage: random.uniform(115, 125). NOT 220-240. NOT European voltage." |
| |
| The model will substitute its training priors for your specifications unless you're explicit β and sometimes even then. Including "NOT X" alongside "use Y" is more reliable than "use Y" alone. |
| |
| ### 4.2 Provide Skeletons, Not Specs |
| |
| Don't ask the model to write 120 lines from scratch. Give it 80 lines of skeleton with TODOs to fill in. The model contributes 15-30 lines of logic instead of generating boilerplate that blows the token budget. |
| |
| \`\`\` |
| Complete this skeleton. Fill in the TODO sections. |
| Do NOT modify the existing code. Do NOT add Pydantic models. |
| |
| \`\`\`python |
| def generate_sensors(): |
| data = {} |
| for d in DISTRICTS: |
| power = {"voltage": round(random.uniform(115,125),1), ...} |
| # TODO: Override anomaly districts |
| data[d] = {"power": power, "water": water, "traffic": traffic} |
| return data |
| |
| @app.get("/api/health") |
| def health(): |
| # TODO: compute overall status |
| pass |
| \`\`\` |
| \`\`\` |
| |
| ### 4.3 Binary Reviews, Not Open-Ended Critique |
| |
| A 3B-active model cannot reliably do nuanced code review. It will flag correct code as wrong if the review criteria are ambiguous. Reduce the Reviewer to binary checks: |
| |
| 1. Are all three endpoints present? |
| 2. Do they return data (not \`pass\`)? |
| 3. Is CORS enabled? |
| 4. Does uvicorn bind to the right port? |
| 5. Any syntax errors? |
| |
| ### 4.4 Template Injection for Large Outputs |
| |
| The model caps output at ~2,000 characters. A full HTML dashboard with Chart.js is 5,000-8,000 characters. Don't fight the limit β work with it. |
| |
| Have the model generate only the dynamic part (a \`generateData()\` function), then inject it into a pre-built template: |
| |
| \`\`\`python |
| template = open("dashboard_template.html").read() |
| gen_func = run_agent("analyst", "Output ONLY a generateData() function...") |
| dashboard = template.replace("%%GENERATE_DATA%%", gen_func) |
| \`\`\` |
| |
| ### 4.5 Always Start with "Output ONLY..." |
| |
| Every prompt to \`openclaw agent\` should begin with a forceful instruction: |
| |
| \`\`\` |
| Output ONLY raw HTML. No explanation. No markdown. Start with <!DOCTYPE html>. |
| \`\`\` |
| |
| \`\`\` |
| Output ONLY a Python code block. Start with \`\`\`python end with \`\`\`. No explanation. |
| \`\`\` |
| |
| Without this, the agent wraps its output in conversational prose, markdown formatting, or tool-call XML. |
| |
| --- |
| |
| ## 5. Lessons Learned |
| |
| | Problem | Root Cause | Solution | |
| |---------|-----------|----------| |
| | Model ignores explicit values | Training priors override prompt | Include "NOT X" alongside "use Y" | |
| | Output truncated mid-code | maxTokens: 4096 hardcoded | Patch NemoClaw Dockerfile | |
| | Output still truncated | Thinking tokens consume budget | \`--thinking off\` on every call | |
| | Output still short (~2K chars) | Model's natural stop behavior | Skeleton prompts, template injection | |
| | Retries produce worse output | Session context accumulates | Unique session ID per call | |
| | Tool-call XML in output | OpenClaw agent framework | Regex cleanup post-processing | |
| | Reviewer rejects valid code | Reviewing against drifted plan | Binary checks, not schema comparison | |
| | Sandbox creation fails | Policy YAML parsing bug | Create manually via \`openshell sandbox create\` | |
| | Live UI shows no output | Python stdout buffering | \`flush=True\` + \`PYTHONUNBUFFERED=1\` | |
| | Port 8080 in use | NemoClaw gateway occupies it | Use different port for UI server | |
| |
| --- |
| |
| ## 6. Project Structure |
| |
| \`\`\` |
| nemoclaw-demo/ |
| βββ orchestrator.py # Main pipeline |
| βββ ui_server.py # Live browser UI (port 8888) |
| βββ prompts/ |
| β βββ architect.txt # Spec β plan |
| β βββ coder_initial.txt # Plan β code (skeleton) |
| β βββ coder_retry.txt # Error β fixed code |
| β βββ reviewer.txt # Code β binary verdict |
| β βββ analyst.txt # β generateData() function |
| β βββ dashboard_template.html # Pre-built dashboard UI |
| βββ fallback/ |
| β βββ generated_app.py # Known-good backend |
| β βββ dashboard.html # Known-good dashboard |
| βββ output/ # Pipeline output |
| βββ logs/ # Per-run logs |
| \`\`\` |
| |
| --- |
| |
| ## 7. Running the Demo |
| |
| \`\`\`bash |
| # Terminal 1: Start the live UI |
| python3 ui_server.py |
| |
| # Open in browser (use the Network URL it prints) |
| # Click RUN PIPELINE |
| \`\`\` |
| |
| Or run the pipeline directly: |
| |
| \`\`\`bash |
| python3 orchestrator.py |
| # Open output/dashboard.html in a browser |
| \`\`\` |
| |
| --- |
| |
| ## Security Architecture |
| |
| Each NemoClaw sandbox enforces: |
| |
| - **Landlock** filesystem restrictions β agents see only \`/sandbox\` and \`/tmp\` |
| - **seccomp** system call filtering β blocked syscalls fail silently |
| - **Network namespace isolation** β no direct egress, only \`inference.local\` |
| - **Inference routing** β agents hit \`inference.local\`, the gateway routes to Ollama; the agent never sees host IPs or ports |
| |
| The orchestrator runs on the host as a trusted operator. Agents never communicate directly β every artifact passes through the host-side orchestrator. This is least-privilege at the agent level. |
| |
| --- |
| |
| *Last updated: April 2026* |
| *NemoClaw v0.1.0 | OpenShell v0.0.16 | OpenClaw 2026.3.11* |
| `; |
| const content = document.getElementById('content'); |
| content.innerHTML = marked.parse(markdown); |
| document.querySelectorAll('pre code').forEach((block) => { |
| hljs.highlightElement(block); |
| }); |
| </script> |
| </body> |
| </html> |