Phi-2-MLX-5bit / README.md

Irfanuruchi

Update README.md

24c3917 verified 3 months ago

preview code

raw

history blame contribute delete

1.67 kB

metadata

license: mit
license_link: https://huggingface.co/microsoft/phi-2/resolve/main/LICENSE
language:
  - en
pipeline_tag: text-generation
tags:
  - nlp
  - code
  - mlx
base_model: microsoft/phi-2
library_name: mlx

Phi-2 MLX 5-bit

This repository provides a 5-bit MLX-quantized version of Microsoft Phi-2, optimized for higher output quality while remaining suitable for local, offline inference on Apple Silicon.

This variant offers better instruction-following and coherence compared to the 4-bit version, at a modest increase in memory usage.

Model Details

Base model: microsoft/phi-2
Architecture: Decoder-only Transformer
License: MIT
Quantization: MLX static quantization (≈5.5 bits per weight)
Target hardware: Apple Silicon (M1 / M2 / M3)

Performance Characteristics

Metric	Value
Disk size	~1.9–2.1 GB
Peak RAM usage	~2.0–2.2 GB
Inference speed	Moderate
Instruction quality	Higher

Usage

mlx_lm.generate \
  --model /path/to/Phi-2-MLX-5bit \
  --prompt "Explain the FFT in simple terms." \
  --max-tokens 120

Notes

This is a quantized conversion, not a fine-tuned model.
The 5-bit version is recommended for:
- better reasoning consistency
- fewer repetitions
- improved instruction adherence
For maximum speed and lower memory usage, see the 4-bit variant.

License

This repository redistributes a quantized MLX conversion of Microsoft Phi-2.

Original model license: MIT
MLX conversion: MIT

See LICENSE for details.