--- title: Pi Web Agent sdk: static app_file: dist/index.html fullWidth: true models: - onnx-community/Qwen2.5-Coder-0.5B-Instruct - onnx-community/Qwen3-0.6B-ONNX - Mike0021/MiniCPM5-1B-ONNX-Web custom_headers: cross-origin-embedder-policy: credentialless cross-origin-opener-policy: same-origin cross-origin-resource-policy: cross-origin --- # Pi Web Agent This workspace ships a browser-only pi agent app backed by Transformers.js and WebContainers. The default planner is `onnx-community/Qwen2.5-Coder-0.5B-Instruct` because it produced the strongest browser result in the local task suite; Qwen3 0.6B and the converted MiniCPM5 model remain selectable for comparison. Published artifact: https://huggingface.co/Mike0021/MiniCPM5-1B-ONNX-Web The required runtime layout is: - `config.json`, `generation_config.json`, tokenizer files, and `chat_template.jinja` at the repo root - q4 ONNX weights at `onnx/model_q4.onnx` - `config.json` includes `transformers.js_config.dtype = "q4"` so the default loader selects the web-sized artifact The conversion uses an ONNX export with KV cache (`text-generation-with-past`) and then applies ONNX Runtime 4-bit MatMul quantization. A generic ONNX export without KV cache is not enough for normal Transformers.js autoregressive generation. ## Run the Web App ```bash npm install npm run dev ``` Open http://localhost:5173/. The app uses: - `@earendil-works/pi-agent-core` for the agent loop, transcript state, and tool execution. - `@huggingface/transformers` with `onnx-community/Qwen2.5-Coder-0.5B-Instruct` as the default local browser planner. - `@webcontainer/api` for the client-only sandbox with a virtual filesystem and browser-contained Node.js processes. Vite serves the app with COOP/COEP headers and boots WebContainers with `coep: "credentialless"`. The deterministic test model is available at `http://localhost:5173/?mode=mock&device=wasm` for fast harness and sandbox smoke tests without downloading an ONNX model. The local model defaults to a tested 256-token generation budget in WASM mode and the UI allows budgets up to 8192. The Static Space uses the same isolation policy through `custom_headers` in this README frontmatter. The app is built with `npm run build` and the generated `dist/` directory is uploaded to the Space. ## Test the Agent App Start the dev server, then run: ```bash npm run smoke:web ``` The smoke test opens Chromium, confirms `crossOriginIsolated`, submits the chat prompt in deterministic mode, boots the WebContainer sandbox, writes `hello.js`, spawns `node hello.js`, and checks for `pi sandbox result: 42` in the chat transcript. For the heavier end-to-end check with the real ONNX model in browser WASM mode: ```bash npm run smoke:local-model ``` This downloads/loads the q4 ONNX artifact in Chrome, runs the same pi/WebContainer task, and checks that the model reaches `Model ready` before the sandbox result is accepted. The complex smoke test covers simple code execution, installing and using an npm package, and a multi-file ES module task: ```bash npm run smoke:complex ``` The sandbox can install and use Node packages through the same `run_command` tool, for example `npm install is-number@7.0.0` followed by `node check-package.mjs`. To probe larger browser generation budgets: ```bash TOKEN_BUDGETS=80,256,2048,8192 npm run probe:tokens ``` Measured local WASM results with Qwen2.5-Coder 0.5B: - `npm run smoke:local-model` passed at 80 max new tokens. - `MAX_NEW_TOKENS=256 npm run smoke:complex` passed simple, npm dependency, and multi-file module tasks. - `TOKEN_BUDGETS=80,160,256,512,1024,2048,4096,8192 npm run probe:tokens` passed. Higher caps were accepted; for the probe task the model stopped naturally before using the full cap. ## Verify the Published Artifact ```bash npm install node scripts/verify_tjs_model.mjs Mike0021/MiniCPM5-1B-ONNX-Web ``` The verifier asks Transformers.js for the `text-generation` file plan, checks for `onnx/model_q4.onnx`, then loads the model and generates a short completion. ## Convert and Upload The published repo was produced locally with a CPU fp16 export followed by q4 ONNX quantization: ```bash uv run --python 3.12 \ --with "numpy<2" \ --with "transformers==4.57.6" \ --with "optimum[onnx]" \ --with "onnxruntime==1.20.1" \ --with onnxslim \ --with "huggingface_hub>=0.33" \ --with accelerate \ --with sentencepiece \ --with protobuf \ scripts/convert_minicpm5_tjs.py \ --source-model openbmb/MiniCPM5-1B \ --target-repo Mike0021/MiniCPM5-1B-ONNX-Web \ --output-dir output/MiniCPM5-1B-ONNX-Web \ --work-dir output/minicpm5-work \ --device cpu \ --export-dtype fp16 ``` For a clean remote conversion, the same script can be run on Hugging Face Jobs with a configured Hub token: ```bash hf repos create Mike0021/MiniCPM5-1B-ONNX-Web --repo-type model --exist-ok hf jobs uv run scripts/convert_minicpm5_tjs.py \ --flavor l4x1 \ --timeout 6h \ --secrets HF_TOKEN \ --with "numpy<2" \ --with "transformers==4.57.6" \ --with "optimum[onnx]" \ --with "onnxruntime==1.20.1" \ --with onnxslim \ --with "huggingface_hub>=0.33" \ --with accelerate \ --with sentencepiece \ --with protobuf \ --python 3.12 \ -- \ --target-repo Mike0021/MiniCPM5-1B-ONNX-Web \ --export-dtype fp16 ```