| tags: | |
| - gguf | |
| - llama.cpp | |
| - quantized | |
| - microsoft/phi-4 | |
| license: apache-2.0 | |
| # yasserrmd/phi-4-gguf | |
| This model was converted to GGUF format from [`microsoft/phi-4`](https://huggingface.co/microsoft/phi-4) using llama.cpp via | |
| [Convert Model to GGUF](https://github.com/ruslanmv/convert-model-to-GGUF). | |
| **Key Features:** | |
| * Quantized for reduced file size (GGUF format) | |
| * Optimized for use with llama.cpp | |
| * Compatible with llama-server for efficient serving | |
| Refer to the [original model card](https://huggingface.co/microsoft/phi-4) for more details on the base model. | |
| ## Usage with llama.cpp | |
| **1. Install llama.cpp:** | |
| ```bash | |
| brew install llama.cpp # For macOS/Linux | |
| ``` | |
| **2. Run Inference:** | |
| **CLI:** | |
| ```bash | |
| llama-cli --hf-repo yasserrmd/phi-4-gguf --hf-file /content/phi-4.q2_k.gguf -p "Your prompt here" | |
| ``` | |
| **Server:** | |
| ```bash | |
| llama-server --hf-repo yasserrmd/phi-4-gguf --hf-file /content/phi-4.q2_k.gguf -c 2048 | |
| ``` | |
| For more advanced usage, refer to the [llama.cpp repository](https://github.com/ggerganov/llama.cpp). | |