Phi-4-Mini-3.8B / README.md
drkameleon's picture
Duplicate from Mungert/Phi-4-mini-instruct.gguf
c8e078b verified
---
license: mit
---
# **Phi-4-mini-instruct GGUF Models**
This repository contains the **Phi-4-mini-instruct** model quantized using a specialized branch of **llama.cpp**:
🔗 [ns3284/llama.cpp](https://github.com/ns3284/llama.cpp/tree/master)
Special thanks to [@nisparks](https://github.com/nisparks) for adding support for **Phi-4-mini-instruct** in **llama.cpp**.
This branch is expected to be merged into the master branch soon, so once that happens, it's recommended to use the main **llama.cpp** repository instead.
---
## **Included Files**
### `phi-4-mini-bf16.gguf`
- Model weights preserved in **BF16**.
- Use this if you want to **requantize** the model into a different format.
### `phi-4-mini-bf16-q8.gguf`
- **Output & embeddings** remain in **BF16**.
- All other layers quantized to **Q8_0**.
### `phi-4-mini-q4_k_l.gguf`
- **Output & embeddings** quantized to **Q8_0**.
- All other layers quantized to **Q4_K**.
- **Note:** No custom matrix quantization applied, so default **llama.cpp** quantization settings are used.
### `phi-4-mini-q6_k.gguf`
- All layers quantized to **Q6_K**, using **default quantization settings**.