Phi-4-mini-instruct – Q3_K_M GGUF
GGUF quantized conversion of Microsoft Phi-4-mini-instruct, optimized for local inference using llama.cpp–compatible runtimes.
This repository provides a compressed version of the original safetensors model to enable efficient CPU and GPU inference with reduced memory requirements.
Model Details
Model Description
This is a post-training quantization of Phi-4-mini-instruct converted from the original safetensors weights to GGUF format and quantized using the Q3_K_M method.
- Developed by (base model): Microsoft
- Quantized by: Community conversion
- Model type: Large Language Model (LLM) – Instruct
- Language(s): English (multilingual capability depends on base model)
- License: MIT (inherits from base model)
- Finetuned from: microsoft/Phi-4-mini-instruct
Quantization Details
| File | Quant Method | Format |
|---|---|---|
| Phi-4-mini-instruct-Q3_K_M.gguf | Q3_K_M | GGUF |
Quantization pipeline:
Safetensors → F16 GGUF → Q3_K_M quantization
Intended Uses
Direct Use
- Local chat assistants
- Instruction following
- Text generation
- Offline inference
- RAG pipelines
Downstream Use
- Fine-tuning via LoRA / QLoRA (external tooling)
- Integration in chat UIs
- Agent frameworks
Out-of-Scope Use
- Safety-critical systems
- Legal / medical decision making
- Real-time autonomous control
How to Get Started
llama.cpp
- Downloads last month
- 30
Hardware compatibility
Log In to add your hardware
3-bit
Model tree for cosmeq/Phi-4-mini-instruct-Q3_K_M-GGUF
Base model
microsoft/Phi-4-mini-instruct