File size: 4,128 Bytes
e1d0067
 
 
21e6b9b
 
 
 
 
 
 
 
e1d0067
 
21e6b9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3b6ceae
21e6b9b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
---
title: MiniCPM5 Pi Web Agent
sdk: static
app_file: dist/index.html
fullWidth: true
models:
  - Mike0021/MiniCPM5-1B-ONNX-Web
custom_headers:
  cross-origin-embedder-policy: credentialless
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin
---

# MiniCPM5-1B Pi Web Agent

This workspace converts `openbmb/MiniCPM5-1B` into a browser-loadable Transformers.js model and ships a browser-only pi agent app.

Published artifact: https://huggingface.co/Mike0021/MiniCPM5-1B-ONNX-Web

The required runtime layout is:

- `config.json`, `generation_config.json`, tokenizer files, and `chat_template.jinja` at the repo root
- q4 ONNX weights at `onnx/model_q4.onnx`
- `config.json` includes `transformers.js_config.dtype = "q4"` so the default loader selects the web-sized artifact

The conversion uses an ONNX export with KV cache (`text-generation-with-past`) and then applies ONNX Runtime 4-bit MatMul quantization. A generic ONNX export without KV cache is not enough for normal Transformers.js autoregressive generation.

## Run the Web App

```bash
npm install
npm run dev
```

Open http://localhost:5173/.

The app uses:

- `@earendil-works/pi-agent-core` for the agent loop, transcript state, and tool execution.
- `@huggingface/transformers` with `Mike0021/MiniCPM5-1B-ONNX-Web` for the local browser model.
- `@webcontainer/api` for the client-only sandbox with a virtual filesystem and browser-contained Node.js processes.

Vite serves the app with COOP/COEP headers and boots WebContainers with `coep: "credentialless"`. The deterministic test model is available at `http://localhost:5173/?mode=mock&device=wasm` for fast harness and sandbox smoke tests without downloading the full ONNX model.

The Static Space uses the same isolation policy through `custom_headers` in this README frontmatter. The app is built with `npm run build` and the generated `dist/` directory is uploaded to the Space.

## Test the Agent App

Start the dev server, then run:

```bash
npm run smoke:web
```

The smoke test opens Chromium, confirms `crossOriginIsolated`, boots the WebContainer sandbox, runs the pi agent in deterministic mode, writes `hello.js`, spawns `node hello.js`, and checks for `pi sandbox result: 42` in the transcript.

For the heavier end-to-end check with the real MiniCPM5 ONNX model in browser WASM mode:

```bash
npm run smoke:local-model
```

This downloads/loads the q4 ONNX artifact in Chromium, runs the same pi/WebContainer task, and checks that the model reaches `Model ready` before the sandbox result is accepted.

## Verify the Published Artifact

```bash
npm install
node scripts/verify_tjs_model.mjs Mike0021/MiniCPM5-1B-ONNX-Web
```

The verifier asks Transformers.js for the `text-generation` file plan, checks for `onnx/model_q4.onnx`, then loads the model and generates a short completion.

## Convert and Upload

The published repo was produced locally with a CPU fp16 export followed by q4 ONNX quantization:

```bash
uv run --python 3.12 \
  --with "numpy<2" \
  --with "transformers==4.57.6" \
  --with "optimum[onnx]" \
  --with "onnxruntime==1.20.1" \
  --with onnxslim \
  --with "huggingface_hub>=0.33" \
  --with accelerate \
  --with sentencepiece \
  --with protobuf \
  scripts/convert_minicpm5_tjs.py \
  --source-model openbmb/MiniCPM5-1B \
  --target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
  --output-dir output/MiniCPM5-1B-ONNX-Web \
  --work-dir output/minicpm5-work \
  --device cpu \
  --export-dtype fp16
```

For a clean remote conversion, the same script can be run on Hugging Face Jobs with a configured Hub token:

```bash
hf repos create Mike0021/MiniCPM5-1B-ONNX-Web --repo-type model --exist-ok
hf jobs uv run scripts/convert_minicpm5_tjs.py \
  --flavor l4x1 \
  --timeout 6h \
  --secrets HF_TOKEN \
  --with "numpy<2" \
  --with "transformers==4.57.6" \
  --with "optimum[onnx]" \
  --with "onnxruntime==1.20.1" \
  --with onnxslim \
  --with "huggingface_hub>=0.33" \
  --with accelerate \
  --with sentencepiece \
  --with protobuf \
  --python 3.12 \
  -- \
  --target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
  --export-dtype fp16
```