⚠️ Existing MLX quantized Gemma 4 models (mlx-community, unsloth) produce garbage output due to quantizing PLE (Per-Layer Embedding) layers.
#1
by Alkd - opened
Fixed 2 days ago during the release window: https://github.com/Blaizzy/mlx-vlm/pull/893
This works perfectly fine for me:
pip install --upgrade mlx-vlm
mlx_vlm.generate --model mlx-community/gemma-4-e2b-it-bf16 --prompt "Who are you?"
Works for bf16, doesn't work for quantization. See http://github.com/jundot/omlx/issues/534 nvm! seems to work!