Irfanuruchi's picture
Update README.md
eb3e14b verified
metadata
language:
  - multilingual
  - ar
  - zh
  - cs
  - da
  - nl
  - en
  - fi
  - fr
  - de
  - he
  - hu
  - it
  - ja
  - ko
  - 'no'
  - pl
  - pt
  - ru
  - es
  - sv
  - th
  - tr
  - uk
library_name: mlx
license: mit
license_link: https://huggingface.co/microsoft/Phi-4-mini-instruct/resolve/main/LICENSE
pipeline_tag: text-generation
tags:
  - nlp
  - code
  - mlx
  - apple-silicon
  - on-device
  - phi
  - local-llm
  - quantized
widget:
  - messages:
      - role: user
        content: Can you provide ways to eat combinations of bananas and dragonfruits?
base_model: microsoft/Phi-4-mini-instruct

Phi-4-mini-instruct (MLX 8-bit)

This is an 8-bit MLX quantized version of microsoft/Phi-4-mini-instruct, offering higher quality output at the cost of increased memory usage.

Benchmark Environment

  • Device: MacBook Pro (M3 Pro)
  • Runtime: MLX
  • Precision: 8-bit (~8.5 bits per weight)

Performance (Measured)

  • Disk size: ~3.8 GB
  • Peak memory: ~4.15 GB
  • Generation speed: ~32 tokens/sec

Benchmarks were collected on macOS (M3 Pro).
iPhone / iPad performance will vary depending on hardware and memory.

Usage

mlx_lm.generate \
  --model Irfanuruchi/Phi-4-mini-instruct-MLX-8bit \
  --prompt "Write a 1-paragraph plan for learning Spanish in 30 days." \
  --max-tokens 160

License

Original model license applies. See microsoft/Phi-4-mini-instruct.