Qwen2.5-7B-Instruct-MLX-2bit

MLX 2-bit conversion of Qwen/Qwen2.5-7B-Instruct. Converted directly from the original HF bf16 safetensors. Not from GGUF. Not chained from another quant. No double-quant hop.

Group size: 64. Smaller groups store more scales = better quality, slightly larger file. Most published MLX repos use group-size 64 silently — this repo discloses it.

Apple Silicon only. GGUF Q4_K_M is a llama.cpp quant — MLX has no literal Q4_K_M mode. Don't conflate them.

2-bit warning

This build is experimental. It loads and runs on an M1 16GB host, but the first smoke sweep produced incoherent text on simple prompts. Use 3-bit or 4-bit-gs64 for actual local use until stronger evals say otherwise.

Use

pip install mlx-lm
mlx_lm.generate --model zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit \\
  --prompt "Explain quantum entanglement in one paragraph" --max-tokens 200

Conversion

python -m mlx_lm convert \
  --hf-path Qwen/Qwen2.5-7B-Instruct \
  --mlx-path ./Qwen2.5-7B-Instruct-MLX-2bit \
  -q --q-bits 2 --q-group-size 64

Credits

Source: Qwen/Qwen2.5-7B-Instruct
MLX conversion: zaydiscold

Part of a Qwen2.5-7B-Instruct MLX quant ladder + group-size perplexity sweep. See the sibling repos under zaydiscold for other bit levels and group sizes — perplexity numbers are coming as a separate dataset repo.

Downloads last month: 18

Safetensors

Model size

0.7B params

Tensor type

BF16

U32

MLX

Hardware compatibility

2-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Quantized

(358)

this model

Collection including zaydiscold/Qwen2.5-7B-Instruct-MLX-2bit

Qwen2.5-7B-Instruct MLX ladder + group-size sweep

Collection

Complete MLX grid for Qwen2.5-7B-Instruct — full bit ladder (bf16/8/6/4/3/2-bit) + 4-bit group-size sweep at gs=32/64/128. • 8 items • Updated May 11