Qwen2.5-7B-Instruct MLX 8bit

MLX 8-bit conversion of Qwen/Qwen2.5-7B-Instruct. Default group size (64).

Converted directly from the original HF bf16 safetensors.

The full ladder + group-size sweep

Variant Repo Disk ~Min unified RAM Role
MLX bf16 Qwen2.5-7B-Instruct-MLX-bf16 15.24 GB ~18 GB Reference
MLX 8bit (this repo) this 8.1 GB ~10 GB Near-lossless
MLX 6bit Qwen2.5-7B-Instruct-MLX-6bit 6.2 GB ~8 GB Quality / size middle
MLX 4bit-gs32 Qwen2.5-7B-Instruct-MLX-4bit-gs32 4.77 GB ~7 GB 4-bit, group size 32
MLX 4bit-gs64 Qwen2.5-7B-Instruct-MLX-4bit-gs64 4.3 GB ~6 GB 4-bit, group size 64 (mlx-lm default)
MLX 4bit-gs128 Qwen2.5-7B-Instruct-MLX-4bit-gs128 4.06 GB ~6 GB 4-bit, group size 128
MLX 3bit Qwen2.5-7B-Instruct-MLX-3bit 3.34 GB ~5 GB Smaller, expect quality drop
MLX 2bit Qwen2.5-7B-Instruct-MLX-2bit 2.39 GB ~4 GB Aggressive — verify on workload

Collection: Qwen2.5-7B-Instruct MLX ladder + group-size sweep

Use

pip install mlx-lm
mlx_lm.generate --model zaydiscold/Qwen2.5-7B-Instruct-MLX-8bit \
  --prompt "Explain quantum entanglement in one paragraph" --max-tokens 200

Conversion

python -m mlx_lm convert \
  --hf-path Qwen/Qwen2.5-7B-Instruct \
  --mlx-path ./Qwen2.5-7B-Instruct-MLX-8bit \
  -q --q-bits 8

Notes

  • GGUF Q4_K_M is llama.cpp; MLX has no literal Q4_K_M.
  • See the sibling repos for other bit budgets / group sizes.

Credits

Downloads last month
40
Safetensors
Model size
8B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zaydiscold/Qwen2.5-7B-Instruct-MLX-8bit

Base model

Qwen/Qwen2.5-7B
Quantized
(313)
this model

Collection including zaydiscold/Qwen2.5-7B-Instruct-MLX-8bit