OsaurusAI
/

gemma-4-E4B-it-4bit

4-bit precision

Model card Files Files and versions

gemma-4-E4B-it-4bit / README.md

Osaurus-AI's picture

Update usage to Osaurus branding

c75f732 verified 10 days ago

|

history blame contribute delete

3.24 kB

	---
	language:
	- en
	library_name: mlx
	license: gemma
	license_link: https://ai.google.dev/gemma/docs/gemma_4_license
	pipeline_tag: any-to-any
	base_model: google/gemma-4-E4B-it
	tags:
	- quantized
	- apple-silicon
	- mlx
	- gemma4
	- vision
	- audio
	- multimodal
	- 4bit
	---

	<p align="center">
	<a href="https://osaurus.ai"><img src="https://cdn-avatars.huggingface.co/v1/production/uploads/69d00705ce8872981c6c4fce/GWKjOwezSOhW5iuKpDwq_.png" alt="Osaurus AI" width="120"></a>
	</p>

	<h3 align="center">Gemma 4 E4B-it — 4-bit (MLX)</h3>
	<p align="center">Properly converted with all vision and audio tower weights verified intact</p>

	<p align="center">
	<a href="https://osaurus.ai"><img src="https://img.shields.io/badge/Web-osaurus.ai-blue" alt="Website"></a>
	<a href="https://huggingface.co/OsaurusAI"><img src="https://img.shields.io/badge/HF-OsaurusAI-yellow?logo=huggingface" alt="OsaurusAI"></a>
	</p>

	---

	> Why this exists: The mlx-community 8-bit conversion of Gemma 4 E4B has broken/zeroed-out vision tower weights, producing a model that appears functional for text but silently fails on image and audio inputs. This is a clean conversion from the original `google/gemma-4-E4B-it` with every multimodal weight tensor verified non-zero.

	---

	## Model Details

	\| Property \| Value \|
	\|----------\|-------\|
	\| Base Model \| [`google/gemma-4-E4B-it`](https://huggingface.co/google/gemma-4-E4B-it) \|
	\| Parameters \| 4.5B effective (8B total with Per-Layer Embeddings) \|
	\| Quantization \| 4-bit affine, mixed-precision (MLP layers kept at 8-bit) \|
	\| Avg Bits/Weight \| 6.900 \|
	\| Model Size \| 6.4 GB \|
	\| Architecture \| Gemma 4 (text + vision + audio) \|
	\| Context Length \| 128K tokens \|
	\| Vocabulary \| 262K tokens \|

	## Multimodal Weight Verification

	Every tensor in every multimodal component was loaded and checked for `max(abs(tensor)) > 0`. Zero broken weights found.

	\| Component \| Tensor Count \| Status \|
	\|-----------\|-------------\|--------\|
	\| Vision Tower (SigLIP) \| 658 \| All non-zero \|
	\| Audio Tower (Conformer) \| 751 \| All non-zero \|
	\| Language Model \| 1,485 \| All non-zero \|
	\| Total \| 2,894 \| All verified \|

	## Mixed-Precision Quantization

	mlx-vlm's default quantization predicate automatically keeps MLP gate/up/down projections at 8-bit across all 42 language model layers while quantizing attention and other weights to 4-bit. This improves quality over naive uniform 4-bit quantization.

	## Usage

	```bash
	# Requires Osaurus (https://osaurus.ai)
	osaurus serve OsaurusAI/gemma-4-E4B-it-4bit
	```

	```python
	# Python API
	from mlx_vlm import load, generate

	model, processor = load("OsaurusAI/gemma-4-E4B-it-4bit")

	# Text-only
	output = generate(model, processor, "Explain quantum computing", max_tokens=500)

	# With image
	output = generate(model, processor, "Describe this image", ["path/to/image.jpg"], max_tokens=500)
	```

	## Conversion Details

	\| Detail \| Value \|
	\|--------\|-------\|
	\| Tool \| `mlx-vlm` v0.4.4 \|
	\| Source dtype \| bfloat16 \|
	\| Quantization mode \| affine \|
	\| Group size \| 64 \|
	\| Source \| `google/gemma-4-E4B-it` (original Google release) \|

	---

	<p align="center">Converted by <a href="https://osaurus.ai">Osaurus AI</a></p>