ltpla/gemma-4-e4b-it-noaudio-8bit

This is a modified version of mlx-community/gemma-4-e4b-it-8bit. The Universal Speech Model encoder (audio_tower.*) and audio embedder (embed_audio.*) — 754 weight keys, ~610 MB — have been removed; config.json has audio_config and audio_token_id dropped and has_audio set to false. The text model and vision tower are unchanged.

Useful when audio input is not needed and disk/memory footprint matters (e.g. on systems with 16 GB unified memory). Audio prompts will fail at the model level — the audio tower is gone. Text-only and image inputs work exactly as the original.

Licence

Gemma is provided under and subject to Google's Gemma Terms of Use and Gemma Prohibited Use Policy. By using, modifying, or distributing this model you agree to those terms, including the prohibited-use restrictions. This work is a modification; the original Gemma 4 model card is at google/gemma-4-e4b-it.

Modifications from base

Stripped audio_tower.* and embed_audio.* weights
Dropped audio_config and audio_token_id from config.json
Set has_audio: false
Repacked safetensors with updated model.safetensors.index.json

Size

This variant: ~8.4 GB
Base (8bit): ~9.0 GB

Use with mlx-vlm

pip install -U mlx-vlm
python -m mlx_vlm.generate \
  --model ltpla/gemma-4-e4b-it-noaudio-8bit \
  --max-tokens 100 --temperature 0.0 \
  --prompt "Describe this image." --image <path_to_image>

Downloads last month: 10

Safetensors

Model size

2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ltpla/gemma-4-e4b-it-noaudio-8bit

Base model

mlx-community/gemma-4-e4b-it-8bit

Quantized

(1)

this model