| license: mit | |
| # **Phi-4-mini-instruct GGUF Models** | |
| This repository contains the **Phi-4-mini-instruct** model quantized using a specialized branch of **llama.cpp**: | |
| 🔗 [ns3284/llama.cpp](https://github.com/ns3284/llama.cpp/tree/master) | |
| Special thanks to [@nisparks](https://github.com/nisparks) for adding support for **Phi-4-mini-instruct** in **llama.cpp**. | |
| This branch is expected to be merged into the master branch soon, so once that happens, it's recommended to use the main **llama.cpp** repository instead. | |
| --- | |
| ## **Included Files** | |
| ### `phi-4-mini-bf16.gguf` | |
| - Model weights preserved in **BF16**. | |
| - Use this if you want to **requantize** the model into a different format. | |
| ### `phi-4-mini-bf16-q8.gguf` | |
| - **Output & embeddings** remain in **BF16**. | |
| - All other layers quantized to **Q8_0**. | |
| ### `phi-4-mini-q4_k_l.gguf` | |
| - **Output & embeddings** quantized to **Q8_0**. | |
| - All other layers quantized to **Q4_K**. | |
| - **Note:** No custom matrix quantization applied, so default **llama.cpp** quantization settings are used. | |
| ### `phi-4-mini-q6_k.gguf` | |
| - All layers quantized to **Q6_K**, using **default quantization settings**. |