File size: 6,316 Bytes
461f4bc
 
 
aab0173
 
 
 
 
 
 
 
 
 
461f4bc
 
aab0173
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
---
title: Pi CLI Web
sdk: docker
app_port: 7860
fullWidth: true
custom_headers:
  cross-origin-embedder-policy: credentialless
  cross-origin-opener-policy: same-origin
  cross-origin-resource-policy: cross-origin
models:
  - onnx-community/Qwen2.5-Coder-0.5B-Instruct
  - onnx-community/Qwen3-0.6B-ONNX
  - Mike0021/MiniCPM5-1B-ONNX-Web
---

# Pi CLI Web

This workspace ships a browser-only port of the `pi` CLI backed by Transformers.js, WebContainers, and a real terminal surface. The UI uses `ghostty-web` first, with `@xterm/xterm` as a fallback, and exposes Pi's built-in tool names: `read`, `bash`, `edit`, `write`, plus read-only `grep`, `find`, and `ls`.

The default planner is `onnx-community/Qwen2.5-Coder-0.5B-Instruct` because it produced the strongest browser result in the local task suite; Qwen3 0.6B and the converted MiniCPM5 model remain selectable for comparison.

Published artifact: https://huggingface.co/Mike0021/MiniCPM5-1B-ONNX-Web

The required runtime layout is:

- `config.json`, `generation_config.json`, tokenizer files, and `chat_template.jinja` at the repo root
- q4 ONNX weights at `onnx/model_q4.onnx`
- `config.json` includes `transformers.js_config.dtype = "q4"` so the default loader selects the web-sized artifact

The conversion uses an ONNX export with KV cache (`text-generation-with-past`) and then applies ONNX Runtime 4-bit MatMul quantization. A generic ONNX export without KV cache is not enough for normal Transformers.js autoregressive generation.

## Run the Web App

```bash
npm install
npm run dev
```

Open http://localhost:5173/.

The app uses:

- `@earendil-works/pi-agent-core` for the agent loop, transcript state, and tool execution.
- `@earendil-works/pi-coding-agent` as the installed CLI contract for parity checks against `pi --help` and `pi --version`.
- `ghostty-web` as the terminal frontend, with `@xterm/xterm` fallback.
- `@huggingface/transformers` with `onnx-community/Qwen2.5-Coder-0.5B-Instruct` as the default local browser planner.
- `@webcontainer/api` for the client-only sandbox with a virtual filesystem and browser-contained Node.js processes.

Vite serves the app with COOP/COEP headers and boots WebContainers with `coep: "credentialless"`. The deterministic test model is available at `http://localhost:5173/?mode=mock&device=wasm` for fast harness and sandbox smoke tests without downloading an ONNX model. The local model defaults to a tested 256-token generation budget in WASM mode and supports budgets up to 8192 through the `tokens=` query parameter and `/settings tokens=<n>`.

The Hugging Face Space builds the Vite app in Docker and serves `dist/index.html` through a tiny Node static server. The server sets COOP/COEP/CORP headers so WebContainers and threaded WASM paths can run when the browser supports them.

## Test the CLI Web App

Start the dev server, then run:

```bash
npm run smoke:web
```

The smoke test opens Chromium, confirms `crossOriginIsolated`, verifies the terminal startup, runs `/help`, executes a direct `!!node ...` command, then submits a deterministic Pi task that writes `hello.js`, runs `bash`/Node in WebContainer, and checks for `pi sandbox result: 42`.

To compare the web terminal contract against the installed real CLI:

```bash
npm run parity:cli
```

This checks `pi --version`, the `pi --help` contract, slash commands, and the built-in tool names exposed by the browser terminal.

For the heavier end-to-end check with the real ONNX model in browser WASM mode:

```bash
npm run smoke:local-model
```

This downloads/loads the q4 ONNX artifact in Chrome, runs the same pi/WebContainer task, and checks that the model reaches `Model ready` before the sandbox result is accepted.

The complex smoke test covers simple code execution, installing and using an npm package, and a multi-file ES module task:

```bash
npm run smoke:complex
```

The sandbox can install and use Node packages through the same Pi `bash` tool, for example `npm install is-number@7.0.0` followed by `node check-package.mjs`.

To probe larger browser generation budgets:

```bash
TOKEN_BUDGETS=80,256,2048,8192 npm run probe:tokens
```

Measured local WASM results with Qwen2.5-Coder 0.5B:

- `npm run smoke:web` passed in deterministic mode using the `ghostty-web` terminal.
- `npm run parity:cli` passed against `@earendil-works/pi-coding-agent@0.77.0`.
- `MAX_NEW_TOKENS=80 npm run smoke:local-model` passed with the real browser model.
- `MAX_NEW_TOKENS=256 npm run smoke:complex` passed simple, npm dependency, and multi-file module tasks with the real browser model.
- `TOKEN_BUDGETS=80,160,256,512,1024,2048,4096,8192 npm run probe:tokens` passed. Higher caps were accepted; for the probe task the model stopped naturally before using the full cap.

## Verify the Published Artifact

```bash
npm install
node scripts/verify_tjs_model.mjs Mike0021/MiniCPM5-1B-ONNX-Web
```

The verifier asks Transformers.js for the `text-generation` file plan, checks for `onnx/model_q4.onnx`, then loads the model and generates a short completion.

## Convert and Upload

The published repo was produced locally with a CPU fp16 export followed by q4 ONNX quantization:

```bash
uv run --python 3.12 \
  --with "numpy<2" \
  --with "transformers==4.57.6" \
  --with "optimum[onnx]" \
  --with "onnxruntime==1.20.1" \
  --with onnxslim \
  --with "huggingface_hub>=0.33" \
  --with accelerate \
  --with sentencepiece \
  --with protobuf \
  scripts/convert_minicpm5_tjs.py \
  --source-model openbmb/MiniCPM5-1B \
  --target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
  --output-dir output/MiniCPM5-1B-ONNX-Web \
  --work-dir output/minicpm5-work \
  --device cpu \
  --export-dtype fp16
```

For a clean remote conversion, the same script can be run on Hugging Face Jobs with a configured Hub token:

```bash
hf repos create Mike0021/MiniCPM5-1B-ONNX-Web --repo-type model --exist-ok
hf jobs uv run scripts/convert_minicpm5_tjs.py \
  --flavor l4x1 \
  --timeout 6h \
  --secrets HF_TOKEN \
  --with "numpy<2" \
  --with "transformers==4.57.6" \
  --with "optimum[onnx]" \
  --with "onnxruntime==1.20.1" \
  --with onnxslim \
  --with "huggingface_hub>=0.33" \
  --with accelerate \
  --with sentencepiece \
  --with protobuf \
  --python 3.12 \
  -- \
  --target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
  --export-dtype fp16
```