gemma-4-E2B-coreml / README.md
mlboydaisuke's picture
Upload folder using huggingface_hub
385d64e verified
metadata
license: apache-2.0
tags:
  - coreml
  - gemma4
  - multimodal
  - vision
  - on-device
  - ane
base_model: google/gemma-4-E2B-it
pipeline_tag: image-text-to-text

Gemma 4 E2B — CoreML (ANE+GPU Optimized)

Converted from google/gemma-4-E2B-it for on-device inference on Apple devices via CoreML.

Models

File Size Description
model.mlpackage 2.4 GB Text decoder with stateful KV cache (int4 quantized)
vision.mlpackage 322 MB Vision encoder (SigLIP-based, 16 transformer layers)
model_config.json Model configuration
hf_model/tokenizer.json 31 MB Tokenizer

Features

  • Multimodal: Image + text input → text output
  • ANE-optimized: Conv2d linear layers, ANE RMSNorm, in-model argmax
  • Stateful KV cache: MLState API (iOS 18+)
  • Int4 quantized: Block-wise palettization (group_size=32)
  • HF-exact match: "solid red square centered on white background" ✅

Usage

import coremltools as ct
import numpy as np

# Load models
vision = ct.models.MLModel('vision.mlpackage')
decoder = ct.models.MLModel('model.mlpackage')
state = decoder.make_state()

# Process image → vision features → text generation

See CoreML-LLM for the full conversion pipeline and iOS sample app.

Conversion

git clone https://github.com/john-rocky/CoreML-LLM
cd CoreML-LLM/conversion
pip install -r requirements.txt
python convert.py --model gemma4-e2b --context-length 512 --output ./output/gemma4-e2b