Gemma 4 E2B-it — 4-bit (MLX)

Properly converted with all vision and audio tower weights verified intact

Why this exists: Some mlx-community conversions of Gemma 4 have broken or zeroed-out vision/audio tower weights, producing models that appear functional for text but silently fail on image and audio inputs. This is a clean conversion from the original google/gemma-4-E2B-it with every multimodal weight tensor verified non-zero.

Model Details

Property	Value
Base Model	`google/gemma-4-E2B-it`
Parameters	2.3B effective (5.1B total with Per-Layer Embeddings)
Quantization	4-bit affine, mixed-precision (MLP layers kept at 8-bit)
Avg Bits/Weight	6.851
Model Size	4.1 GB
Architecture	Gemma 4 (text + vision + audio)
Context Length	128K tokens
Vocabulary	262K tokens

Multimodal Weight Verification

Every tensor in every multimodal component was loaded and checked for max(abs(tensor)) > 0. Zero broken weights found.

Component	Tensor Count	Status
Vision Tower (SigLIP)	658	All non-zero
Audio Tower (Conformer)	751	All non-zero
Language Model	1,240	All non-zero
Total	2,649	All verified

Mixed-Precision Quantization

mlx-vlm's default quantization predicate automatically keeps MLP gate/up/down projections at 8-bit across all 35 language model layers while quantizing attention and other weights to 4-bit. This improves quality over naive uniform 4-bit quantization.

Usage

# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/gemma-4-E2B-it-4bit

# Python API
from mlx_vlm import load, generate

model, processor = load("OsaurusAI/gemma-4-E2B-it-4bit")

# Text-only
output = generate(model, processor, "Explain quantum computing", max_tokens=500)

# With image
output = generate(model, processor, "Describe this image", ["path/to/image.jpg"], max_tokens=500)

Conversion Details

Detail	Value
Tool	`mlx-vlm` v0.4.4
Source dtype	bfloat16
Quantization mode	affine
Group size	64
Source	`google/gemma-4-E2B-it` (original Google release)

Converted by Osaurus AI

Downloads last month: 124

Safetensors

Model size

1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

4-bit

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/gemma-4-E2B-it-4bit

Base model

google/gemma-4-E2B

Finetuned

google/gemma-4-E2B-it

Quantized

(285)

this model