Qwen2.5-7B-Instruct MLX ladder + group-size sweep
Collection
Complete MLX grid for Qwen2.5-7B-Instruct — full bit ladder (bf16/8/6/4/3/2-bit) + 4-bit group-size sweep at gs=32/64/128. • 8 items • Updated
How to use zaydiscold/Qwen2.5-7B-Instruct-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Qwen2.5-7B-Instruct-MLX-8bit zaydiscold/Qwen2.5-7B-Instruct-MLX-8bit
MLX 8-bit conversion of Qwen/Qwen2.5-7B-Instruct. Default group size (64).
Converted directly from the original HF bf16 safetensors.
| Variant | Repo | Disk | ~Min unified RAM | Role |
|---|---|---|---|---|
| MLX bf16 | Qwen2.5-7B-Instruct-MLX-bf16 |
15.24 GB | ~18 GB | Reference |
| MLX 8bit (this repo) | this | 8.1 GB | ~10 GB | Near-lossless |
| MLX 6bit | Qwen2.5-7B-Instruct-MLX-6bit |
6.2 GB | ~8 GB | Quality / size middle |
| MLX 4bit-gs32 | Qwen2.5-7B-Instruct-MLX-4bit-gs32 |
4.77 GB | ~7 GB | 4-bit, group size 32 |
| MLX 4bit-gs64 | Qwen2.5-7B-Instruct-MLX-4bit-gs64 |
4.3 GB | ~6 GB | 4-bit, group size 64 (mlx-lm default) |
| MLX 4bit-gs128 | Qwen2.5-7B-Instruct-MLX-4bit-gs128 |
4.06 GB | ~6 GB | 4-bit, group size 128 |
| MLX 3bit | Qwen2.5-7B-Instruct-MLX-3bit |
3.34 GB | ~5 GB | Smaller, expect quality drop |
| MLX 2bit | Qwen2.5-7B-Instruct-MLX-2bit |
2.39 GB | ~4 GB | Aggressive — verify on workload |
Collection: Qwen2.5-7B-Instruct MLX ladder + group-size sweep
pip install mlx-lm
mlx_lm.generate --model zaydiscold/Qwen2.5-7B-Instruct-MLX-8bit \
--prompt "Explain quantum entanglement in one paragraph" --max-tokens 200
python -m mlx_lm convert \
--hf-path Qwen/Qwen2.5-7B-Instruct \
--mlx-path ./Qwen2.5-7B-Instruct-MLX-8bit \
-q --q-bits 8
Q4_K_M is llama.cpp; MLX has no literal Q4_K_M.8-bit