Phi-2 MLX 4-bit

This repository provides a 4-bit MLX-quantized version of Microsoft Phi-2, optimized for fast, low-memory local inference on Apple Silicon.

This variant prioritizes speed and minimal RAM usage, making it ideal for laptops and on-device experimentation.

Model Details

mlx_lm.generate \
  --model /path/to/Phi-2-MLX-4bit \
  --prompt "Explain the FFT in simple terms." \
  --max-tokens 120

This is a quantized conversion, not a fine-tune.
The 4-bit version is best for:
- faster inference
- lower memory usage
- interactive local testing
For higher-quality reasoning and instruction-following, see the 5-bit variant.

This repository redistributes a quantized MLX conversion of Microsoft Phi-2.

See LICENSE for details.

Safetensors

Model size

0.4B params

Tensor type

F16

U32

MLX

Hardware compatibility

4-bit

Base model

Quantized

(58)

this model