metadata
language:
- multilingual
- ar
- zh
- cs
- da
- nl
- en
- fi
- fr
- de
- he
- hu
- it
- ja
- ko
- 'no'
- pl
- pt
- ru
- es
- sv
- th
- tr
- uk
library_name: mlx
license: mit
license_link: https://huggingface.co/microsoft/Phi-4-mini-instruct/resolve/main/LICENSE
pipeline_tag: text-generation
tags:
- nlp
- code
- mlx
- apple-silicon
- on-device
- phi
- local-llm
- quantized
widget:
- messages:
- role: user
content: Can you provide ways to eat combinations of bananas and dragonfruits?
base_model: microsoft/Phi-4-mini-instruct
Phi-4-mini-instruct (MLX 8-bit)
This is an 8-bit MLX quantized version of microsoft/Phi-4-mini-instruct, offering higher quality output at the cost of increased memory usage.
Benchmark Environment
- Device: MacBook Pro (M3 Pro)
- Runtime: MLX
- Precision: 8-bit (~8.5 bits per weight)
Performance (Measured)
- Disk size: ~3.8 GB
- Peak memory: ~4.15 GB
- Generation speed: ~32 tokens/sec
Benchmarks were collected on macOS (M3 Pro).
iPhone / iPad performance will vary depending on hardware and memory.
Usage
mlx_lm.generate \
--model Irfanuruchi/Phi-4-mini-instruct-MLX-8bit \
--prompt "Write a 1-paragraph plan for learning Spanish in 30 days." \
--max-tokens 160
License
Original model license applies. See microsoft/Phi-4-mini-instruct.