MiniCPM5-1B MLX DWQ 4-bit

MLX 4-bit quantization of openbmb/MiniCPM5-1B.

Quantization

  • Runtime: MLX / mlx-lm
  • Bits: 4
  • Group size: 64
  • Mode: affine
  • Observed packed size: about 580 MB
  • Conversion log: mlx_lm.convert reported 4.501 bits per weight

Smoke Test

Local Apple M2 smoke test:

  • Prompt tokens: 17
  • Prompt speed: 28.621 tok/s
  • Generation speed: 133.897 tok/s
  • Peak memory: 0.674 GB

Validation Status

Runtime loading and generation work, but this artifact is not yet release-approved on the strict multilingual/code/tool-use validation matrix. Known failures include Persian response quality, Arabic arithmetic response quality, and reasoning leakage on tool-planning prompts.

Usage

pip install -U mlx-lm
mlx_lm.generate --model Reza2kn/MiniCPM5-1B-MLX-DWQ-4bit --prompt "Hello" --max-tokens 64
Downloads last month
222
Safetensors
Model size
0.2B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Reza2kn/MiniCPM5-1B-MLX-DWQ-4bit

Quantized
(42)
this model