Phi-4-mini-instruct – Q3_K_M GGUF

GGUF quantized conversion of Microsoft Phi-4-mini-instruct, optimized for local inference using llama.cpp–compatible runtimes.

This repository provides a compressed version of the original safetensors model to enable efficient CPU and GPU inference with reduced memory requirements.

Model Details

Model Description

This is a post-training quantization of Phi-4-mini-instruct converted from the original safetensors weights to GGUF format and quantized using the Q3_K_M method.

Developed by (base model): Microsoft
Quantized by: Community conversion
Model type: Large Language Model (LLM) – Instruct
Language(s): English (multilingual capability depends on base model)
License: MIT (inherits from base model)
Finetuned from: microsoft/Phi-4-mini-instruct

Quantization Details

File	Quant Method	Format
Phi-4-mini-instruct-Q3_K_M.gguf	Q3_K_M	GGUF

Quantization pipeline:

Safetensors → F16 GGUF → Q3_K_M quantization

Intended Uses

Direct Use

Local chat assistants
Instruction following
Text generation
Offline inference
RAG pipelines

Downstream Use

Fine-tuning via LoRA / QLoRA (external tooling)
Integration in chat UIs
Agent frameworks

Out-of-Scope Use

Safety-critical systems
Legal / medical decision making
Real-time autonomous control

How to Get Started

llama.cpp

Downloads last month: 28

GGUF

Model size

4B params

Architecture

phi3

Hardware compatibility

3-bit

Model tree for cosmeq/Phi-4-mini-instruct-Q3_K_M-GGUF

Base model

microsoft/Phi-4-mini-instruct

Quantized

(144)

this model