Phi-4-mini-instruct – Q3_K_M GGUF

GGUF quantized conversion of Microsoft Phi-4-mini-instruct, optimized for local inference using llama.cpp–compatible runtimes.

This repository provides a compressed version of the original safetensors model to enable efficient CPU and GPU inference with reduced memory requirements.


Model Details

Model Description

This is a post-training quantization of Phi-4-mini-instruct converted from the original safetensors weights to GGUF format and quantized using the Q3_K_M method.

  • Developed by (base model): Microsoft
  • Quantized by: Community conversion
  • Model type: Large Language Model (LLM) – Instruct
  • Language(s): English (multilingual capability depends on base model)
  • License: MIT (inherits from base model)
  • Finetuned from: microsoft/Phi-4-mini-instruct

Quantization Details

File Quant Method Format
Phi-4-mini-instruct-Q3_K_M.gguf Q3_K_M GGUF

Quantization pipeline:

Safetensors → F16 GGUF → Q3_K_M quantization


Intended Uses

Direct Use

  • Local chat assistants
  • Instruction following
  • Text generation
  • Offline inference
  • RAG pipelines

Downstream Use

  • Fine-tuning via LoRA / QLoRA (external tooling)
  • Integration in chat UIs
  • Agent frameworks

Out-of-Scope Use

  • Safety-critical systems
  • Legal / medical decision making
  • Real-time autonomous control

How to Get Started

llama.cpp

Downloads last month
30
GGUF
Model size
4B params
Architecture
phi3
Hardware compatibility
Log In to add your hardware

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for cosmeq/Phi-4-mini-instruct-Q3_K_M-GGUF

Quantized
(144)
this model