Spaces:

Mike0021
/

Pi-CLI-Web

Sleeping

App Files Files Community

Pi-CLI-Web / README.md

Mike0021

Deploy pi cli web docker server

aab0173 verified 7 days ago

preview code

raw

history blame contribute delete

6.32 kB

	---
	title: Pi CLI Web
	sdk: docker
	app_port: 7860
	fullWidth: true
	custom_headers:
	cross-origin-embedder-policy: credentialless
	cross-origin-opener-policy: same-origin
	cross-origin-resource-policy: cross-origin
	models:
	- onnx-community/Qwen2.5-Coder-0.5B-Instruct
	- onnx-community/Qwen3-0.6B-ONNX
	- Mike0021/MiniCPM5-1B-ONNX-Web
	---

	# Pi CLI Web

	This workspace ships a browser-only port of the `pi` CLI backed by Transformers.js, WebContainers, and a real terminal surface. The UI uses `ghostty-web` first, with `@xterm/xterm` as a fallback, and exposes Pi's built-in tool names: `read`, `bash`, `edit`, `write`, plus read-only `grep`, `find`, and `ls`.

	The default planner is `onnx-community/Qwen2.5-Coder-0.5B-Instruct` because it produced the strongest browser result in the local task suite; Qwen3 0.6B and the converted MiniCPM5 model remain selectable for comparison.

	Published artifact: https://huggingface.co/Mike0021/MiniCPM5-1B-ONNX-Web

	The required runtime layout is:

	- `config.json`, `generation_config.json`, tokenizer files, and `chat_template.jinja` at the repo root
	- q4 ONNX weights at `onnx/model_q4.onnx`
	- `config.json` includes `transformers.js_config.dtype = "q4"` so the default loader selects the web-sized artifact

	The conversion uses an ONNX export with KV cache (`text-generation-with-past`) and then applies ONNX Runtime 4-bit MatMul quantization. A generic ONNX export without KV cache is not enough for normal Transformers.js autoregressive generation.

	## Run the Web App

	```bash
	npm install
	npm run dev
	```

	Open http://localhost:5173/.

	The app uses:

	- `@earendil-works/pi-agent-core` for the agent loop, transcript state, and tool execution.
	- `@earendil-works/pi-coding-agent` as the installed CLI contract for parity checks against `pi --help` and `pi --version`.
	- `ghostty-web` as the terminal frontend, with `@xterm/xterm` fallback.
	- `@huggingface/transformers` with `onnx-community/Qwen2.5-Coder-0.5B-Instruct` as the default local browser planner.
	- `@webcontainer/api` for the client-only sandbox with a virtual filesystem and browser-contained Node.js processes.

	Vite serves the app with COOP/COEP headers and boots WebContainers with `coep: "credentialless"`. The deterministic test model is available at `http://localhost:5173/?mode=mock&device=wasm` for fast harness and sandbox smoke tests without downloading an ONNX model. The local model defaults to a tested 256-token generation budget in WASM mode and supports budgets up to 8192 through the `tokens=` query parameter and `/settings tokens=<n>`.

	The Hugging Face Space builds the Vite app in Docker and serves `dist/index.html` through a tiny Node static server. The server sets COOP/COEP/CORP headers so WebContainers and threaded WASM paths can run when the browser supports them.

	## Test the CLI Web App

	Start the dev server, then run:

	```bash
	npm run smoke:web
	```

	The smoke test opens Chromium, confirms `crossOriginIsolated`, verifies the terminal startup, runs `/help`, executes a direct `!!node ...` command, then submits a deterministic Pi task that writes `hello.js`, runs `bash`/Node in WebContainer, and checks for `pi sandbox result: 42`.

	To compare the web terminal contract against the installed real CLI:

	```bash
	npm run parity:cli
	```

	This checks `pi --version`, the `pi --help` contract, slash commands, and the built-in tool names exposed by the browser terminal.

	For the heavier end-to-end check with the real ONNX model in browser WASM mode:

	```bash
	npm run smoke:local-model
	```

	This downloads/loads the q4 ONNX artifact in Chrome, runs the same pi/WebContainer task, and checks that the model reaches `Model ready` before the sandbox result is accepted.

	The complex smoke test covers simple code execution, installing and using an npm package, and a multi-file ES module task:

	```bash
	npm run smoke:complex
	```

	The sandbox can install and use Node packages through the same Pi `bash` tool, for example `npm install is-number@7.0.0` followed by `node check-package.mjs`.

	To probe larger browser generation budgets:

	```bash
	TOKEN_BUDGETS=80,256,2048,8192 npm run probe:tokens
	```

	Measured local WASM results with Qwen2.5-Coder 0.5B:

	- `npm run smoke:web` passed in deterministic mode using the `ghostty-web` terminal.
	- `npm run parity:cli` passed against `@earendil-works/pi-coding-agent@0.77.0`.
	- `MAX_NEW_TOKENS=80 npm run smoke:local-model` passed with the real browser model.
	- `MAX_NEW_TOKENS=256 npm run smoke:complex` passed simple, npm dependency, and multi-file module tasks with the real browser model.
	- `TOKEN_BUDGETS=80,160,256,512,1024,2048,4096,8192 npm run probe:tokens` passed. Higher caps were accepted; for the probe task the model stopped naturally before using the full cap.

	## Verify the Published Artifact

	```bash
	npm install
	node scripts/verify_tjs_model.mjs Mike0021/MiniCPM5-1B-ONNX-Web
	```

	The verifier asks Transformers.js for the `text-generation` file plan, checks for `onnx/model_q4.onnx`, then loads the model and generates a short completion.

	## Convert and Upload

	The published repo was produced locally with a CPU fp16 export followed by q4 ONNX quantization:

	```bash
	uv run --python 3.12 \
	--with "numpy<2" \
	--with "transformers==4.57.6" \
	--with "optimum[onnx]" \
	--with "onnxruntime==1.20.1" \
	--with onnxslim \
	--with "huggingface_hub>=0.33" \
	--with accelerate \
	--with sentencepiece \
	--with protobuf \
	scripts/convert_minicpm5_tjs.py \
	--source-model openbmb/MiniCPM5-1B \
	--target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
	--output-dir output/MiniCPM5-1B-ONNX-Web \
	--work-dir output/minicpm5-work \
	--device cpu \
	--export-dtype fp16
	```

	For a clean remote conversion, the same script can be run on Hugging Face Jobs with a configured Hub token:

	```bash
	hf repos create Mike0021/MiniCPM5-1B-ONNX-Web --repo-type model --exist-ok
	hf jobs uv run scripts/convert_minicpm5_tjs.py \
	--flavor l4x1 \
	--timeout 6h \
	--secrets HF_TOKEN \
	--with "numpy<2" \
	--with "transformers==4.57.6" \
	--with "optimum[onnx]" \
	--with "onnxruntime==1.20.1" \
	--with onnxslim \
	--with "huggingface_hub>=0.33" \
	--with accelerate \
	--with sentencepiece \
	--with protobuf \
	--python 3.12 \
	-- \
	--target-repo Mike0021/MiniCPM5-1B-ONNX-Web \
	--export-dtype fp16
	```