ltpla/gemma-4-e4b-it-noaudio-8bit

This is a modified version of mlx-community/gemma-4-e4b-it-8bit. The Universal Speech Model encoder (audio_tower.*) and audio embedder (embed_audio.*) — 754 weight keys, ~610 MB — have been removed; config.json has audio_config and audio_token_id dropped and has_audio set to false. The text model and vision tower are unchanged.

Useful when audio input is not needed and disk/memory footprint matters (e.g. on systems with 16 GB unified memory). Audio prompts will fail at the model level — the audio tower is gone. Text-only and image inputs work exactly as the original.

Licence

Gemma is provided under and subject to Google's Gemma Terms of Use and Gemma Prohibited Use Policy. By using, modifying, or distributing this model you agree to those terms, including the prohibited-use restrictions. This work is a modification; the original Gemma 4 model card is at google/gemma-4-e4b-it.

Modifications from base

  • Stripped audio_tower.* and embed_audio.* weights
  • Dropped audio_config and audio_token_id from config.json
  • Set has_audio: false
  • Repacked safetensors with updated model.safetensors.index.json

Size

  • This variant: ~8.4 GB
  • Base (8bit): ~9.0 GB

Use with mlx-vlm

pip install -U mlx-vlm
python -m mlx_vlm.generate \
  --model ltpla/gemma-4-e4b-it-noaudio-8bit \
  --max-tokens 100 --temperature 0.0 \
  --prompt "Describe this image." --image <path_to_image>
Downloads last month
169
Safetensors
Model size
2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ltpla/gemma-4-e4b-it-noaudio-8bit

Quantized
(1)
this model