gemma-3-270m-it-mlc-webgpu / COMPILATION_GUIDE.md

Upload folder using huggingface_hub

2229feb verified 2 months ago

3.89 kB

Compilation Guide: Gemma 3 270M-it for WebGPU & Mobile

This document details the step-by-step process used to compile google/gemma-3-270m-it for WebGPU and mobile platforms using MLC-LLM.

Prerequisites

Operating System: macOS (Apple Silicon recommended for performance) or Linux.
Python: 3.11 or 3.12 (Managed via venv).
MLC-LLM: Nightly build (necessary for Gemma 3 support).
Emscripten: Required for WebGPU (WASM) compilation.
Vulkan SDK: Required for Android compilation testing (optional for build).

1. Environment Setup

We utilized a Python virtual environment and installed the specific nightly wheels for macOS.

# Create and activate environment
python3 -m venv venv
source venv/bin/activate

# Install MLC-LLM Nightly (Verify latest instructions on mlc.ai)
pip install --pre --force-reinstall mlc-llm-nightly-cpu mlc-ai-nightly-cpu \
    -f https://mlc.ai/wheels

2. Model Download

We used a custom script (setup_gemma.py) to download the model from Hugging Face.

Source: google/gemma-3-270m-it
Authentication: Requires HF_TOKEN environment variable.

3. Configuration Generation

Standard generation fails because Gemma 3 IT is multimodal. We generated a text-only configuration by manually stripping vision-related fields from the config.

Command used:

python -m mlc_llm gen_config ./models/gemma-3-270m-it \
    --quantization q4f16_1 \
    --conv-template gemma_instruction \
    --output ./dist/gemma-3-270m-it-mlc

Modifications:

Ensured is_text_model: true in mlc-chat-config.json.
Removed vision_config and image processing parameters.

4. WebGPU Compilation (The Hard Part)

Compiling for WebGPU requires Emscripten and building the TVM runtime from source, as the pip packages do not contain the necessary bitcode libraries (wasm_runtime.bc).

Step 4a: Install Emscripten

We automated this with install_emscripten.sh.

git clone https://github.com/emscripten-core/emsdk.git
cd emsdk
./emsdk install latest
./emsdk activate latest
source ./emsdk_env.sh

Step 4b: Build Runtime Libraries

We cloned mlc-llm source and built the required .bc files using a temporary workspace to handle path spacing issues.

Artifacts Built: wasm_runtime.bc, tvmjs_support.bc, webgpu_runtime.bc, mlc_wasm_runtime.bc.
Destination: Installed into venv/lib/python3.12/site-packages/tvm/ and source dist/wasm/.

Step 4c: Compile Model

Using compile_webgpu.py:

python -m mlc_llm compile ./dist/gemma-3-270m-it-mlc/mlc-chat-config.json \
    --device webgpu \
    --opt O3 \
    --output ./dist/libs/gemma-3-270m-it-webgpu.wasm

5. Mobile Compilation

For iOS and Android, we used the standard mlc_llm compile command with respective targets.

iOS: --device iphone -> gemma-3-270m-ios.tar
Android: --device android -> gemma-3-270m-android.tar

Troubleshooting

Emscripten & SSL on macOS

Issue: curl failed during emsdk install with SSL errors. Fix: Unset SSL_CERT_FILE before running installation.

unset SSL_CERT_FILE

Missing Runtime Libraries (`Cannot find library: ...bc`)

Issue: The default pip install is minimal and lacks WASM support libraries. Fix: You MUST build mlc-llm runtime from source (using build_webgpu_runtime.sh) and assume the installed python package structure matches safely.

Path Spaces

Issue: Projects in folders with spaces ("Gemma 3 270m") break make and clang. Fix: Build scripts were updated to move sources to /tmp for the compilation phase.

File Structure

The final release package structure:

gemma-3-270m-it-mlc/: The generic configuration and weights.
libs/: platform-specific compiled binaries/WASM.
README.md: Documentation.