Phi-4-mini-instruct โ€” GGUF Quants

Quantized GGUF versions of Microsoft's Phi-4-mini-instruct โ€” a highly capable 3.8B parameter dense decoder-only model with 128K context, trained on 5 trillion tokens of curated data.

Phi-4-mini consistently punches above its weight class across reasoning, math, and instruction-following benchmarks despite its compact size โ€” an ideal choice for local deployment on hardware with 4-8GB VRAM.

Available Files

File Quant Size Use Case
Phi-4-mini-instruct-Q8_0.gguf Q8_0 ~3.9GB Maximum quality
Phi-4-mini-instruct-Q6_K.gguf Q6_K ~3.0GB Near-lossless
Phi-4-mini-instruct-Q5_K_M.gguf Q5_K_M ~2.7GB High quality
Phi-4-mini-instruct-Q4_K_M.gguf Q4_K_M ~2.4GB Recommended default
Phi-4-mini-instruct-Q3_K_M.gguf Q3_K_M ~2.0GB Low VRAM
Phi-4-mini-instruct-IQ4_XS.gguf IQ4_XS ~2.1GB Imatrix 4-bit
Phi-4-mini-instruct-IQ3_XXS.gguf IQ3_XXS ~1.6GB Imatrix 3-bit
Phi-4-mini-instruct-IQ2_M.gguf IQ2_M ~1.5GB Imatrix 2-bit
Phi-4-mini-instruct-IQ1_S.gguf IQ1_S ~1.1GB Extreme compression
Phi-4-mini-instruct-fp16.gguf FP16 ~7.2GB Full precision
imatrix.dat โ€” โ€” Importance matrix

Usage

# llama.cpp
./llama-cli -m Phi-4-mini-instruct-Q4_K_M.gguf \
  --ctx-size 8192 -n 512 \
  -p "<|system|>You are a helpful assistant.<|end|><|user|>Hello!<|end|><|assistant|>"

# Ollama
ollama run hf.co/DuoNeural/Phi-4-mini-instruct-GGUF:Q4_K_M

About Phi-4-mini

  • Parameters: 3.8B
  • Context: 128K tokens
  • Architecture: Dense decoder-only transformer
  • Training: 5T tokens of curated synthetic + web data
  • Strengths: Math reasoning, coding, instruction following at minimal VRAM footprint

Ideal for: Raspberry Pi 5, older laptops, mobile-class GPUs (GTX 1060/1070), CPU-only inference.


Quantized by DuoNeural using llama.cpp on RTX 5090.


DuoNeural

DuoNeural is an open AI research lab โ€” human + AI in collaboration.

DuoNeural Research Publications

Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura โ€” DuoNeural.

Downloads last month
1,493
GGUF
Model size
4B params
Architecture
phi3
Hardware compatibility
Log In to add your hardware

1-bit

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for DuoNeural/Phi-4-mini-instruct-GGUF

Quantized
(147)
this model