Spaces:
Configuration error
title: MiniCPM5 Pi Web Agent
sdk: static
app_file: dist/index.html
fullWidth: true
models:
- Mike0021/MiniCPM5-1B-ONNX-Web
custom_headers:
cross-origin-embedder-policy: credentialless
cross-origin-opener-policy: same-origin
cross-origin-resource-policy: cross-origin
MiniCPM5-1B Pi Web Agent
This workspace converts openbmb/MiniCPM5-1B into a browser-loadable Transformers.js model and ships a browser-only pi agent app.
Published artifact: https://huggingface.co/Mike0021/MiniCPM5-1B-ONNX-Web
The required runtime layout is:
config.json,generation_config.json, tokenizer files, andchat_template.jinjaat the repo root- q4 ONNX weights at
onnx/model_q4.onnx config.jsonincludestransformers.js_config.dtype = "q4"so the default loader selects the web-sized artifact
The conversion uses an ONNX export with KV cache (text-generation-with-past) and then applies ONNX Runtime 4-bit MatMul quantization. A generic ONNX export without KV cache is not enough for normal Transformers.js autoregressive generation.
Run the Web App
npm install
npm run dev
Open http://localhost:5173/.
The app uses:
@earendil-works/pi-agent-corefor the agent loop, transcript state, and tool execution.@huggingface/transformerswithMike0021/MiniCPM5-1B-ONNX-Webfor the local browser model.@webcontainer/apifor the client-only sandbox with a virtual filesystem and browser-contained Node.js processes.
Vite serves the app with COOP/COEP headers and boots WebContainers with coep: "credentialless". The deterministic test model is available at http://localhost:5173/?mode=mock&device=wasm for fast harness and sandbox smoke tests without downloading the full ONNX model.
The Static Space uses the same isolation policy through custom_headers in this README frontmatter. The app is built with npm run build and the generated dist/ directory is uploaded to the Space.
Test the Agent App
Start the dev server, then run:
npm run smoke:web
The smoke test opens Chromium, confirms crossOriginIsolated, boots the WebContainer sandbox, runs the pi agent in deterministic mode, writes hello.js, spawns node hello.js, and checks for pi sandbox result: 42 in the transcript.
For the heavier end-to-end check with the real MiniCPM5 ONNX model in browser WASM mode:
npm run smoke:local-model
This downloads/loads the q4 ONNX artifact in Chromium, runs the same pi/WebContainer task, and checks that the model reaches Model ready before the sandbox result is accepted.
Verify the Published Artifact
npm install
node scripts/verify_tjs_model.mjs Mike0021/MiniCPM5-1B-ONNX-Web
The verifier asks Transformers.js for the text-generation file plan, checks for onnx/model_q4.onnx, then loads the model and generates a short completion.
Convert and Upload
The published repo was produced locally with a CPU fp16 export followed by q4 ONNX quantization:
uv run --python 3.12 \
--with "numpy<2" \
--with "transformers==4.57.6" \
--with "optimum[onnx]" \
--with "onnxruntime==1.20.1" \
--with onnxslim \
--with "huggingface_hub>=0.33" \
--with accelerate \
--with sentencepiece \
--with protobuf \
scripts/convert_minicpm5_tjs.py \
--source-model openbmb/MiniCPM5-1B \
--target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
--output-dir output/MiniCPM5-1B-ONNX-Web \
--work-dir output/minicpm5-work \
--device cpu \
--export-dtype fp16
For a clean remote conversion, the same script can be run on Hugging Face Jobs with a configured Hub token:
hf repos create Mike0021/MiniCPM5-1B-ONNX-Web --repo-type model --exist-ok
hf jobs uv run scripts/convert_minicpm5_tjs.py \
--flavor l4x1 \
--timeout 6h \
--secrets HF_TOKEN \
--with "numpy<2" \
--with "transformers==4.57.6" \
--with "optimum[onnx]" \
--with "onnxruntime==1.20.1" \
--with onnxslim \
--with "huggingface_hub>=0.33" \
--with accelerate \
--with sentencepiece \
--with protobuf \
--python 3.12 \
-- \
--target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
--export-dtype fp16