gemma-3-270m-it-mlc-webgpu / COMPILATION_GUIDE.md
vikramlingam's picture
Upload folder using huggingface_hub
2229feb verified
# Compilation Guide: Gemma 3 270M-it for WebGPU & Mobile
This document details the step-by-step process used to compile `google/gemma-3-270m-it` for WebGPU and mobile platforms using MLC-LLM.
## Prerequisites
* **Operating System**: macOS (Apple Silicon recommended for performance) or Linux.
* **Python**: 3.11 or 3.12 (Managed via `venv`).
* **MLC-LLM**: Nightly build (necessary for Gemma 3 support).
* **Emscripten**: Required for WebGPU (WASM) compilation.
* **Vulkan SDK**: Required for Android compilation testing (optional for build).
## 1. Environment Setup
We utilized a Python virtual environment and installed the specific nightly wheels for macOS.
```bash
# Create and activate environment
python3 -m venv venv
source venv/bin/activate
# Install MLC-LLM Nightly (Verify latest instructions on mlc.ai)
pip install --pre --force-reinstall mlc-llm-nightly-cpu mlc-ai-nightly-cpu \
-f https://mlc.ai/wheels
```
## 2. Model Download
We used a custom script (`setup_gemma.py`) to download the model from Hugging Face.
* **Source**: `google/gemma-3-270m-it`
* **Authentication**: Requires `HF_TOKEN` environment variable.
## 3. Configuration Generation
Standard generation fails because Gemma 3 IT is multimodal. We generated a **text-only** configuration by manually stripping vision-related fields from the config.
**Command used:**
```bash
python -m mlc_llm gen_config ./models/gemma-3-270m-it \
--quantization q4f16_1 \
--conv-template gemma_instruction \
--output ./dist/gemma-3-270m-it-mlc
```
**Modifications:**
* Ensured `is_text_model: true` in `mlc-chat-config.json`.
* Removed `vision_config` and image processing parameters.
## 4. WebGPU Compilation (The Hard Part)
Compiling for WebGPU requires **Emscripten** and building the TVM runtime from source, as the pip packages do not contain the necessary bitcode libraries (`wasm_runtime.bc`).
### Step 4a: Install Emscripten
We automated this with `install_emscripten.sh`.
```bash
git clone https://github.com/emscripten-core/emsdk.git
cd emsdk
./emsdk install latest
./emsdk activate latest
source ./emsdk_env.sh
```
### Step 4b: Build Runtime Libraries
We cloned `mlc-llm` source and built the required `.bc` files using a temporary workspace to handle path spacing issues.
* **Artifacts Built**: `wasm_runtime.bc`, `tvmjs_support.bc`, `webgpu_runtime.bc`, `mlc_wasm_runtime.bc`.
* **Destination**: Installed into `venv/lib/python3.12/site-packages/tvm/` and source `dist/wasm/`.
### Step 4c: Compile Model
Using `compile_webgpu.py`:
```bash
python -m mlc_llm compile ./dist/gemma-3-270m-it-mlc/mlc-chat-config.json \
--device webgpu \
--opt O3 \
--output ./dist/libs/gemma-3-270m-it-webgpu.wasm
```
## 5. Mobile Compilation
For iOS and Android, we used the standard `mlc_llm compile` command with respective targets.
* **iOS**: `--device iphone` -> `gemma-3-270m-ios.tar`
* **Android**: `--device android` -> `gemma-3-270m-android.tar`
## Troubleshooting
### Emscripten & SSL on macOS
**Issue**: `curl` failed during `emsdk install` with SSL errors.
**Fix**: Unset `SSL_CERT_FILE` before running installation.
```bash
unset SSL_CERT_FILE
```
### Missing Runtime Libraries (`Cannot find library: ...bc`)
**Issue**: The default pip install is minimal and lacks WASM support libraries.
**Fix**: You MUST build `mlc-llm` runtime from source (using `build_webgpu_runtime.sh`) and assume the installed python package structure matches safely.
### Path Spaces
**Issue**: Projects in folders with spaces ("Gemma 3 270m") break `make` and `clang`.
**Fix**: Build scripts were updated to move sources to `/tmp` for the compilation phase.
## File Structure
The final release package structure:
* `gemma-3-270m-it-mlc/`: The generic configuration and weights.
* `libs/`: platform-specific compiled binaries/WASM.
* `README.md`: Documentation.