Osaurus AI

Gemma 4 E4B-it — 4-bit (MLX)

Properly converted with all vision and audio tower weights verified intact

Website  OsaurusAI


Why this exists: The mlx-community 8-bit conversion of Gemma 4 E4B has broken/zeroed-out vision tower weights, producing a model that appears functional for text but silently fails on image and audio inputs. This is a clean conversion from the original google/gemma-4-E4B-it with every multimodal weight tensor verified non-zero.


Model Details

Property Value
Base Model google/gemma-4-E4B-it
Parameters 4.5B effective (8B total with Per-Layer Embeddings)
Quantization 4-bit affine, mixed-precision (MLP layers kept at 8-bit)
Avg Bits/Weight 6.900
Model Size 6.4 GB
Architecture Gemma 4 (text + vision + audio)
Context Length 128K tokens
Vocabulary 262K tokens

Multimodal Weight Verification

Every tensor in every multimodal component was loaded and checked for max(abs(tensor)) > 0. Zero broken weights found.

Component Tensor Count Status
Vision Tower (SigLIP) 658 All non-zero
Audio Tower (Conformer) 751 All non-zero
Language Model 1,485 All non-zero
Total 2,894 All verified

Mixed-Precision Quantization

mlx-vlm's default quantization predicate automatically keeps MLP gate/up/down projections at 8-bit across all 42 language model layers while quantizing attention and other weights to 4-bit. This improves quality over naive uniform 4-bit quantization.

Usage

# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/gemma-4-E4B-it-4bit
# Python API
from mlx_vlm import load, generate

model, processor = load("OsaurusAI/gemma-4-E4B-it-4bit")

# Text-only
output = generate(model, processor, "Explain quantum computing", max_tokens=500)

# With image
output = generate(model, processor, "Describe this image", ["path/to/image.jpg"], max_tokens=500)

Conversion Details

Detail Value
Tool mlx-vlm v0.4.4
Source dtype bfloat16
Quantization mode affine
Group size 64
Source google/gemma-4-E4B-it (original Google release)

Converted by Osaurus AI

Downloads last month
656
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for OsaurusAI/gemma-4-E4B-it-4bit

Quantized
(68)
this model