Instructions to use ltpla/gemma-4-e4b-it-noaudio-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use ltpla/gemma-4-e4b-it-noaudio-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir gemma-4-e4b-it-noaudio-8bit ltpla/gemma-4-e4b-it-noaudio-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
ltpla/gemma-4-e4b-it-noaudio-8bit
This is a modified version of mlx-community/gemma-4-e4b-it-8bit. The Universal Speech Model encoder (audio_tower.*) and audio embedder (embed_audio.*) — 754 weight keys, ~610 MB — have been removed; config.json has audio_config and audio_token_id dropped and has_audio set to false. The text model and vision tower are unchanged.
Useful when audio input is not needed and disk/memory footprint matters (e.g. on systems with 16 GB unified memory). Audio prompts will fail at the model level — the audio tower is gone. Text-only and image inputs work exactly as the original.
Licence
Gemma is provided under and subject to Google's Gemma Terms of Use and Gemma Prohibited Use Policy. By using, modifying, or distributing this model you agree to those terms, including the prohibited-use restrictions. This work is a modification; the original Gemma 4 model card is at google/gemma-4-e4b-it.
Modifications from base
- Stripped
audio_tower.*andembed_audio.*weights - Dropped
audio_configandaudio_token_idfromconfig.json - Set
has_audio: false - Repacked safetensors with updated
model.safetensors.index.json
Size
- This variant: ~8.4 GB
- Base (8bit): ~9.0 GB
Use with mlx-vlm
pip install -U mlx-vlm
python -m mlx_vlm.generate \
--model ltpla/gemma-4-e4b-it-noaudio-8bit \
--max-tokens 100 --temperature 0.0 \
--prompt "Describe this image." --image <path_to_image>
- Downloads last month
- 169
8-bit
Model tree for ltpla/gemma-4-e4b-it-noaudio-8bit
Base model
mlx-community/gemma-4-e4b-it-8bit