--- language: - en library_name: mlx license: gemma license_link: https://ai.google.dev/gemma/docs/gemma_4_license pipeline_tag: any-to-any base_model: google/gemma-4-E4B-it tags: - quantized - apple-silicon - mlx - gemma4 - vision - audio - multimodal - 4bit ---

Gemma 4 E4B-it — 4-bit (MLX)

Properly converted with all vision and audio tower weights verified intact

--- > **Why this exists:** The mlx-community 8-bit conversion of Gemma 4 E4B has broken/zeroed-out vision tower weights, producing a model that appears functional for text but silently fails on image and audio inputs. This is a clean conversion from the original `google/gemma-4-E4B-it` with every multimodal weight tensor verified non-zero. --- ## Model Details | Property | Value | |----------|-------| | **Base Model** | [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) | | **Parameters** | 4.5B effective (8B total with Per-Layer Embeddings) | | **Quantization** | 4-bit affine, mixed-precision (MLP layers kept at 8-bit) | | **Avg Bits/Weight** | 6.900 | | **Model Size** | 6.4 GB | | **Architecture** | Gemma 4 (text + vision + audio) | | **Context Length** | 128K tokens | | **Vocabulary** | 262K tokens | ## Multimodal Weight Verification Every tensor in every multimodal component was loaded and checked for `max(abs(tensor)) > 0`. **Zero broken weights found.** | Component | Tensor Count | Status | |-----------|-------------|--------| | **Vision Tower** (SigLIP) | 658 | All non-zero | | **Audio Tower** (Conformer) | 751 | All non-zero | | **Language Model** | 1,485 | All non-zero | | **Total** | **2,894** | **All verified** | ## Mixed-Precision Quantization mlx-vlm's default quantization predicate automatically keeps MLP gate/up/down projections at 8-bit across all 42 language model layers while quantizing attention and other weights to 4-bit. This improves quality over naive uniform 4-bit quantization. ## Usage ```bash # Requires Osaurus (https://osaurus.ai) osaurus serve OsaurusAI/gemma-4-E4B-it-4bit ``` ```python # Python API from mlx_vlm import load, generate model, processor = load("OsaurusAI/gemma-4-E4B-it-4bit") # Text-only output = generate(model, processor, "Explain quantum computing", max_tokens=500) # With image output = generate(model, processor, "Describe this image", ["path/to/image.jpg"], max_tokens=500) ``` ## Conversion Details | Detail | Value | |--------|-------| | **Tool** | `mlx-vlm` v0.4.4 | | **Source dtype** | bfloat16 | | **Quantization mode** | affine | | **Group size** | 64 | | **Source** | `google/gemma-4-E4B-it` (original Google release) | ---

Converted by Osaurus AI