Phi-3 Mini (GGUF Quantized - Q4_K_M)
Model Description
This repository contains a quantized GGUF version of the Phi-3 Mini 128K Instruct model.
- Base Model: microsoft/Phi-3-mini-128k-instruct
- Format: GGUF
- Quantization: Q4_K_M
- Framework: llama.cpp
This model is optimized for efficient local inference with reduced memory usage.
Intended Use
- Local LLM inference
- Chatbots
- Lightweight deployments
- CPU/GPU inference using llama.cpp
Model Details
Base Model
- Microsoft Phi-3 Mini (128K context)
Conversion
- Converted from Hugging Face format โ GGUF
Quantization
- Method: Q4_K_M
- Tool: llama.cpp quantize
This reduces model size while maintaining reasonable performance.
How to Use
Using llama.cpp
./llama-cli -m model-q4.gguf -p "Explain AI simply"
- Downloads last month
- 139
Hardware compatibility
Log In to add your hardware
We're not able to determine the quantization variants.