Mike0021's picture
Serve uploaded static build directly
96ec1de verified
metadata
title: Pi Web Agent
sdk: static
app_file: dist/index.html
fullWidth: true
models:
  - onnx-community/Qwen2.5-Coder-0.5B-Instruct
  - onnx-community/Qwen3-0.6B-ONNX
  - Mike0021/MiniCPM5-1B-ONNX-Web
custom_headers:
  cross-origin-embedder-policy: credentialless
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin

Pi Web Agent

This workspace ships a browser-only pi agent app backed by Transformers.js and WebContainers. The default planner is onnx-community/Qwen2.5-Coder-0.5B-Instruct because it produced the strongest browser result in the local task suite; Qwen3 0.6B and the converted MiniCPM5 model remain selectable for comparison.

Published artifact: https://huggingface.co/Mike0021/MiniCPM5-1B-ONNX-Web

The required runtime layout is:

  • config.json, generation_config.json, tokenizer files, and chat_template.jinja at the repo root
  • q4 ONNX weights at onnx/model_q4.onnx
  • config.json includes transformers.js_config.dtype = "q4" so the default loader selects the web-sized artifact

The conversion uses an ONNX export with KV cache (text-generation-with-past) and then applies ONNX Runtime 4-bit MatMul quantization. A generic ONNX export without KV cache is not enough for normal Transformers.js autoregressive generation.

Run the Web App

npm install
npm run dev

Open http://localhost:5173/.

The app uses:

  • @earendil-works/pi-agent-core for the agent loop, transcript state, and tool execution.
  • @huggingface/transformers with onnx-community/Qwen2.5-Coder-0.5B-Instruct as the default local browser planner.
  • @webcontainer/api for the client-only sandbox with a virtual filesystem and browser-contained Node.js processes.

Vite serves the app with COOP/COEP headers and boots WebContainers with coep: "credentialless". The deterministic test model is available at http://localhost:5173/?mode=mock&device=wasm for fast harness and sandbox smoke tests without downloading an ONNX model. The local model defaults to a tested 256-token generation budget in WASM mode and the UI allows budgets up to 8192.

The Static Space uses the same isolation policy through custom_headers in this README frontmatter. The app is built with npm run build and the generated dist/ directory is uploaded to the Space.

Test the Agent App

Start the dev server, then run:

npm run smoke:web

The smoke test opens Chromium, confirms crossOriginIsolated, submits the chat prompt in deterministic mode, boots the WebContainer sandbox, writes hello.js, spawns node hello.js, and checks for pi sandbox result: 42 in the chat transcript.

For the heavier end-to-end check with the real ONNX model in browser WASM mode:

npm run smoke:local-model

This downloads/loads the q4 ONNX artifact in Chrome, runs the same pi/WebContainer task, and checks that the model reaches Model ready before the sandbox result is accepted.

The complex smoke test covers simple code execution, installing and using an npm package, and a multi-file ES module task:

npm run smoke:complex

The sandbox can install and use Node packages through the same run_command tool, for example npm install is-number@7.0.0 followed by node check-package.mjs.

To probe larger browser generation budgets:

TOKEN_BUDGETS=80,256,2048,8192 npm run probe:tokens

Measured local WASM results with Qwen2.5-Coder 0.5B:

  • npm run smoke:local-model passed at 80 max new tokens.
  • MAX_NEW_TOKENS=256 npm run smoke:complex passed simple, npm dependency, and multi-file module tasks.
  • TOKEN_BUDGETS=80,160,256,512,1024,2048,4096,8192 npm run probe:tokens passed. Higher caps were accepted; for the probe task the model stopped naturally before using the full cap.

Verify the Published Artifact

npm install
node scripts/verify_tjs_model.mjs Mike0021/MiniCPM5-1B-ONNX-Web

The verifier asks Transformers.js for the text-generation file plan, checks for onnx/model_q4.onnx, then loads the model and generates a short completion.

Convert and Upload

The published repo was produced locally with a CPU fp16 export followed by q4 ONNX quantization:

uv run --python 3.12 \
  --with "numpy<2" \
  --with "transformers==4.57.6" \
  --with "optimum[onnx]" \
  --with "onnxruntime==1.20.1" \
  --with onnxslim \
  --with "huggingface_hub>=0.33" \
  --with accelerate \
  --with sentencepiece \
  --with protobuf \
  scripts/convert_minicpm5_tjs.py \
  --source-model openbmb/MiniCPM5-1B \
  --target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
  --output-dir output/MiniCPM5-1B-ONNX-Web \
  --work-dir output/minicpm5-work \
  --device cpu \
  --export-dtype fp16

For a clean remote conversion, the same script can be run on Hugging Face Jobs with a configured Hub token:

hf repos create Mike0021/MiniCPM5-1B-ONNX-Web --repo-type model --exist-ok
hf jobs uv run scripts/convert_minicpm5_tjs.py \
  --flavor l4x1 \
  --timeout 6h \
  --secrets HF_TOKEN \
  --with "numpy<2" \
  --with "transformers==4.57.6" \
  --with "optimum[onnx]" \
  --with "onnxruntime==1.20.1" \
  --with onnxslim \
  --with "huggingface_hub>=0.33" \
  --with accelerate \
  --with sentencepiece \
  --with protobuf \
  --python 3.12 \
  -- \
  --target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
  --export-dtype fp16