Qwen2.5-7B-Instruct-MLX-2bit

MLX 2-bit conversion of Qwen/Qwen2.5-7B-Instruct. Converted directly from the original HF bf16 safetensors. Not from GGUF. Not chained from another quant. No double-quant hop.

Group size: 64. Smaller groups store more scales = better quality, slightly larger file. Most published MLX repos use group-size 64 silently — this repo discloses it.

Apple Silicon only. GGUF Q4_K_M is a llama.cpp quant — MLX has no literal Q4_K_M mode. Don't conflate them.

2-bit warning

This build is experimental. It loads and runs on an M1 16GB host, but the first smoke sweep produced incoherent text on simple prompts. Use 3-bit or 4-bit-gs64 for actual local use until stronger evals say otherwise.

Use

pip install mlx-lm
mlx_lm.generate --model zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit \\
  --prompt "Explain quantum entanglement in one paragraph" --max-tokens 200

Conversion

python -m mlx_lm convert \
  --hf-path Qwen/Qwen2.5-7B-Instruct \
  --mlx-path ./Qwen2.5-7B-Instruct-MLX-2bit \
  -q --q-bits 2 --q-group-size 64

Credits

Part of a Qwen2.5-7B-Instruct MLX quant ladder + group-size perplexity sweep. See the sibling repos under zaydiscold for other bit levels and group sizes — perplexity numbers are coming as a separate dataset repo.

Downloads last month
89
Safetensors
Model size
0.7B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

2-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit

Base model

Qwen/Qwen2.5-7B
Quantized
(314)
this model

Collection including zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit