gemma-4-E4B-it-4bit / README.md
Osaurus-AI's picture
Update usage to Osaurus branding
c75f732 verified
---
language:
- en
library_name: mlx
license: gemma
license_link: https://ai.google.dev/gemma/docs/gemma_4_license
pipeline_tag: any-to-any
base_model: google/gemma-4-E4B-it
tags:
- quantized
- apple-silicon
- mlx
- gemma4
- vision
- audio
- multimodal
- 4bit
---
<p align="center">
<a href="https://osaurus.ai"><img src="https://cdn-avatars.huggingface.co/v1/production/uploads/69d00705ce8872981c6c4fce/GWKjOwezSOhW5iuKpDwq_.png" alt="Osaurus AI" width="120"></a>
</p>
<h3 align="center">Gemma 4 E4B-it &mdash; 4-bit (MLX)</h3>
<p align="center">Properly converted with all vision and audio tower weights verified intact</p>
<p align="center">
<a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>&nbsp;
<a href="https://huggingface.co/OsaurusAI"><img src="https://img.shields.io/badge/HF-OsaurusAI-yellow?logo=huggingface" alt="OsaurusAI"></a>
</p>
---
> **Why this exists:** The mlx-community 8-bit conversion of Gemma 4 E4B has broken/zeroed-out vision tower weights, producing a model that appears functional for text but silently fails on image and audio inputs. This is a clean conversion from the original `google/gemma-4-E4B-it` with every multimodal weight tensor verified non-zero.
---
## Model Details
| Property | Value |
|----------|-------|
| **Base Model** | [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) |
| **Parameters** | 4.5B effective (8B total with Per-Layer Embeddings) |
| **Quantization** | 4-bit affine, mixed-precision (MLP layers kept at 8-bit) |
| **Avg Bits/Weight** | 6.900 |
| **Model Size** | 6.4 GB |
| **Architecture** | Gemma 4 (text + vision + audio) |
| **Context Length** | 128K tokens |
| **Vocabulary** | 262K tokens |
## Multimodal Weight Verification
Every tensor in every multimodal component was loaded and checked for `max(abs(tensor)) > 0`. **Zero broken weights found.**
| Component | Tensor Count | Status |
|-----------|-------------|--------|
| **Vision Tower** (SigLIP) | 658 | All non-zero |
| **Audio Tower** (Conformer) | 751 | All non-zero |
| **Language Model** | 1,485 | All non-zero |
| **Total** | **2,894** | **All verified** |
## Mixed-Precision Quantization
mlx-vlm's default quantization predicate automatically keeps MLP gate/up/down projections at 8-bit across all 42 language model layers while quantizing attention and other weights to 4-bit. This improves quality over naive uniform 4-bit quantization.
## Usage
```bash
# Requires Osaurus (https://osaurus.ai)
osaurus serve OsaurusAI/gemma-4-E4B-it-4bit
```
```python
# Python API
from mlx_vlm import load, generate
model, processor = load("OsaurusAI/gemma-4-E4B-it-4bit")
# Text-only
output = generate(model, processor, "Explain quantum computing", max_tokens=500)
# With image
output = generate(model, processor, "Describe this image", ["path/to/image.jpg"], max_tokens=500)
```
## Conversion Details
| Detail | Value |
|--------|-------|
| **Tool** | `mlx-vlm` v0.4.4 |
| **Source dtype** | bfloat16 |
| **Quantization mode** | affine |
| **Group size** | 64 |
| **Source** | `google/gemma-4-E4B-it` (original Google release) |
---
<p align="center">Converted by <a href="https://osaurus.ai">Osaurus AI</a></p>