gemma-3-270m-it-mlc-webgpu / COMPILATION_GUIDE.md

Upload folder using huggingface_hub

2229feb verified 2 months ago

3.89 kB

	# Compilation Guide: Gemma 3 270M-it for WebGPU & Mobile

	This document details the step-by-step process used to compile `google/gemma-3-270m-it` for WebGPU and mobile platforms using MLC-LLM.

	## Prerequisites

	* Operating System: macOS (Apple Silicon recommended for performance) or Linux.
	* Python: 3.11 or 3.12 (Managed via `venv`).
	* MLC-LLM: Nightly build (necessary for Gemma 3 support).
	* Emscripten: Required for WebGPU (WASM) compilation.
	* Vulkan SDK: Required for Android compilation testing (optional for build).

	## 1. Environment Setup

	We utilized a Python virtual environment and installed the specific nightly wheels for macOS.

	```bash
	# Create and activate environment
	python3 -m venv venv
	source venv/bin/activate

	# Install MLC-LLM Nightly (Verify latest instructions on mlc.ai)
	pip install --pre --force-reinstall mlc-llm-nightly-cpu mlc-ai-nightly-cpu \
	-f https://mlc.ai/wheels
	```

	## 2. Model Download

	We used a custom script (`setup_gemma.py`) to download the model from Hugging Face.

	* Source: `google/gemma-3-270m-it`
	* Authentication: Requires `HF_TOKEN` environment variable.

	## 3. Configuration Generation

	Standard generation fails because Gemma 3 IT is multimodal. We generated a text-only configuration by manually stripping vision-related fields from the config.

	Command used:
	```bash
	python -m mlc_llm gen_config ./models/gemma-3-270m-it \
	--quantization q4f16_1 \
	--conv-template gemma_instruction \
	--output ./dist/gemma-3-270m-it-mlc
	```

	Modifications:
	* Ensured `is_text_model: true` in `mlc-chat-config.json`.
	* Removed `vision_config` and image processing parameters.

	## 4. WebGPU Compilation (The Hard Part)

	Compiling for WebGPU requires Emscripten and building the TVM runtime from source, as the pip packages do not contain the necessary bitcode libraries (`wasm_runtime.bc`).

	### Step 4a: Install Emscripten
	We automated this with `install_emscripten.sh`.
	```bash
	git clone https://github.com/emscripten-core/emsdk.git
	cd emsdk
	./emsdk install latest
	./emsdk activate latest
	source ./emsdk_env.sh
	```

	### Step 4b: Build Runtime Libraries
	We cloned `mlc-llm` source and built the required `.bc` files using a temporary workspace to handle path spacing issues.
	* Artifacts Built: `wasm_runtime.bc`, `tvmjs_support.bc`, `webgpu_runtime.bc`, `mlc_wasm_runtime.bc`.
	* Destination: Installed into `venv/lib/python3.12/site-packages/tvm/` and source `dist/wasm/`.

	### Step 4c: Compile Model
	Using `compile_webgpu.py`:
	```bash
	python -m mlc_llm compile ./dist/gemma-3-270m-it-mlc/mlc-chat-config.json \
	--device webgpu \
	--opt O3 \
	--output ./dist/libs/gemma-3-270m-it-webgpu.wasm
	```

	## 5. Mobile Compilation

	For iOS and Android, we used the standard `mlc_llm compile` command with respective targets.

	* iOS: `--device iphone` -> `gemma-3-270m-ios.tar`
	* Android: `--device android` -> `gemma-3-270m-android.tar`

	## Troubleshooting

	### Emscripten & SSL on macOS
	Issue: `curl` failed during `emsdk install` with SSL errors.
	Fix: Unset `SSL_CERT_FILE` before running installation.
	```bash
	unset SSL_CERT_FILE
	```

	### Missing Runtime Libraries (`Cannot find library: ...bc`)
	Issue: The default pip install is minimal and lacks WASM support libraries.
	Fix: You MUST build `mlc-llm` runtime from source (using `build_webgpu_runtime.sh`) and assume the installed python package structure matches safely.

	### Path Spaces
	Issue: Projects in folders with spaces ("Gemma 3 270m") break `make` and `clang`.
	Fix: Build scripts were updated to move sources to `/tmp` for the compilation phase.

	## File Structure

	The final release package structure:
	* `gemma-3-270m-it-mlc/`: The generic configuration and weights.
	* `libs/`: platform-specific compiled binaries/WASM.
	* `README.md`: Documentation.