⚠️ Existing MLX quantized Gemma 4 models (mlx-community, unsloth) produce garbage output due to quantizing PLE (Per-Layer Embedding) layers.

#1
by Alkd - opened
MLX Community org

Fixed 2 days ago during the release window: https://github.com/Blaizzy/mlx-vlm/pull/893

This works perfectly fine for me:

pip install --upgrade mlx-vlm
mlx_vlm.generate --model mlx-community/gemma-4-e2b-it-bf16 --prompt "Who are you?"
MLX Community org
edited 1 day ago

Works for bf16, doesn't work for quantization. See http://github.com/jundot/omlx/issues/534 nvm! seems to work!

Sign up or log in to comment