| --- |
| language: |
| - en |
| library_name: mlx |
| license: gemma |
| license_link: https://ai.google.dev/gemma/docs/gemma_4_license |
| pipeline_tag: any-to-any |
| base_model: google/gemma-4-E4B-it |
| tags: |
| - quantized |
| - apple-silicon |
| - mlx |
| - gemma4 |
| - vision |
| - audio |
| - multimodal |
| - 4bit |
| --- |
| |
| <p align="center"> |
| <a href="https://osaurus.ai"><img src="https://cdn-avatars.huggingface.co/v1/production/uploads/69d00705ce8872981c6c4fce/GWKjOwezSOhW5iuKpDwq_.png" alt="Osaurus AI" width="120"></a> |
| </p> |
|
|
| <h3 align="center">Gemma 4 E4B-it — 4-bit (MLX)</h3> |
| <p align="center">Properly converted with all vision and audio tower weights verified intact</p> |
|
|
| <p align="center"> |
| <a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a> |
| <a href="https://huggingface.co/OsaurusAI"><img src="https://img.shields.io/badge/HF-OsaurusAI-yellow?logo=huggingface" alt="OsaurusAI"></a> |
| </p> |
|
|
| --- |
|
|
| > **Why this exists:** The mlx-community 8-bit conversion of Gemma 4 E4B has broken/zeroed-out vision tower weights, producing a model that appears functional for text but silently fails on image and audio inputs. This is a clean conversion from the original `google/gemma-4-E4B-it` with every multimodal weight tensor verified non-zero. |
|
|
| --- |
|
|
| ## Model Details |
|
|
| | Property | Value | |
| |----------|-------| |
| | **Base Model** | [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) | |
| | **Parameters** | 4.5B effective (8B total with Per-Layer Embeddings) | |
| | **Quantization** | 4-bit affine, mixed-precision (MLP layers kept at 8-bit) | |
| | **Avg Bits/Weight** | 6.900 | |
| | **Model Size** | 6.4 GB | |
| | **Architecture** | Gemma 4 (text + vision + audio) | |
| | **Context Length** | 128K tokens | |
| | **Vocabulary** | 262K tokens | |
|
|
| ## Multimodal Weight Verification |
|
|
| Every tensor in every multimodal component was loaded and checked for `max(abs(tensor)) > 0`. **Zero broken weights found.** |
|
|
| | Component | Tensor Count | Status | |
| |-----------|-------------|--------| |
| | **Vision Tower** (SigLIP) | 658 | All non-zero | |
| | **Audio Tower** (Conformer) | 751 | All non-zero | |
| | **Language Model** | 1,485 | All non-zero | |
| | **Total** | **2,894** | **All verified** | |
|
|
| ## Mixed-Precision Quantization |
|
|
| mlx-vlm's default quantization predicate automatically keeps MLP gate/up/down projections at 8-bit across all 42 language model layers while quantizing attention and other weights to 4-bit. This improves quality over naive uniform 4-bit quantization. |
|
|
| ## Usage |
|
|
| ```bash |
| # Requires Osaurus (https://osaurus.ai) |
| osaurus serve OsaurusAI/gemma-4-E4B-it-4bit |
| ``` |
|
|
| ```python |
| # Python API |
| from mlx_vlm import load, generate |
| |
| model, processor = load("OsaurusAI/gemma-4-E4B-it-4bit") |
| |
| # Text-only |
| output = generate(model, processor, "Explain quantum computing", max_tokens=500) |
| |
| # With image |
| output = generate(model, processor, "Describe this image", ["path/to/image.jpg"], max_tokens=500) |
| ``` |
|
|
| ## Conversion Details |
|
|
| | Detail | Value | |
| |--------|-------| |
| | **Tool** | `mlx-vlm` v0.4.4 | |
| | **Source dtype** | bfloat16 | |
| | **Quantization mode** | affine | |
| | **Group size** | 64 | |
| | **Source** | `google/gemma-4-E4B-it` (original Google release) | |
|
|
| --- |
|
|
| <p align="center">Converted by <a href="https://osaurus.ai">Osaurus AI</a></p> |
|
|