Quantized GGUF versions of Microsoft's Phi-4-mini-instruct โ a highly capable 3.8B parameter dense decoder-only model with 128K context, trained on 5 trillion tokens of curated data.
Phi-4-mini consistently punches above its weight class across reasoning, math, and instruction-following benchmarks despite its compact size โ an ideal choice for local deployment on hardware with 4-8GB VRAM.
Available Files
File
Quant
Size
Use Case
Phi-4-mini-instruct-Q8_0.gguf
Q8_0
~3.9GB
Maximum quality
Phi-4-mini-instruct-Q6_K.gguf
Q6_K
~3.0GB
Near-lossless
Phi-4-mini-instruct-Q5_K_M.gguf
Q5_K_M
~2.7GB
High quality
Phi-4-mini-instruct-Q4_K_M.gguf
Q4_K_M
~2.4GB
Recommended default
Phi-4-mini-instruct-Q3_K_M.gguf
Q3_K_M
~2.0GB
Low VRAM
Phi-4-mini-instruct-IQ4_XS.gguf
IQ4_XS
~2.1GB
Imatrix 4-bit
Phi-4-mini-instruct-IQ3_XXS.gguf
IQ3_XXS
~1.6GB
Imatrix 3-bit
Phi-4-mini-instruct-IQ2_M.gguf
IQ2_M
~1.5GB
Imatrix 2-bit
Phi-4-mini-instruct-IQ1_S.gguf
IQ1_S
~1.1GB
Extreme compression
Phi-4-mini-instruct-fp16.gguf
FP16
~7.2GB
Full precision
imatrix.dat
โ
โ
Importance matrix
Usage
# llama.cpp
./llama-cli -m Phi-4-mini-instruct-Q4_K_M.gguf \
--ctx-size 8192 -n 512 \
-p "<|system|>You are a helpful assistant.<|end|><|user|>Hello!<|end|><|assistant|>"# Ollama
ollama run hf.co/DuoNeural/Phi-4-mini-instruct-GGUF:Q4_K_M
About Phi-4-mini
Parameters: 3.8B
Context: 128K tokens
Architecture: Dense decoder-only transformer
Training: 5T tokens of curated synthetic + web data
Strengths: Math reasoning, coding, instruction following at minimal VRAM footprint