cmh
/

phi-4_ZeroWw

@@ -12,7 +12,8 @@ pipeline_tag: text-generation
 - [Output and embed tensors quantized to q8_0, all other tensors quantized for q4_k.](https://huggingface.co/RobertSinclair)
 - [Output and embed tensors quantized to bf16, all other tensors quantized for q5_k, q6_k, q8_0 and q8_0 --pure.](https://huggingface.co/RobertSinclair)
 ```
 python convert_hf_to_gguf.py --outtype bf16 phi-4 --outfile phi-4.bf16.gguf
@@ -32,6 +33,7 @@ llama-quantize --allow-requantize --pure phi-4.bf16.gguf phi-4.bf16.q8_p.gguf q8
 | [phi-4.bf16.q6.im](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q6.im.gguf) | 6.00 bits per weight | 13.2 GB | **15.5 GB** |
 | [phi-4.bf16.q8](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q8.gguf) | 8.00 bits per weight | 16.5 GB | **18.5 GB** |
 | [phi-4.bf16.q8_p](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q8_p.gguf) | 8.00 bits per weight | 15.6 GB | **18.6 GB** |
 <sub>*approximate value at 16k context, FP16 cache.<sup>

 - [Output and embed tensors quantized to q8_0, all other tensors quantized for q4_k.](https://huggingface.co/RobertSinclair)
 - [Output and embed tensors quantized to bf16, all other tensors quantized for q5_k, q6_k, q8_0 and q8_0 --pure.](https://huggingface.co/RobertSinclair)
+- IMatrix q5_k, q6_k
+- BF16
 ```
 python convert_hf_to_gguf.py --outtype bf16 phi-4 --outfile phi-4.bf16.gguf
 | [phi-4.bf16.q6.im](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q6.im.gguf) | 6.00 bits per weight | 13.2 GB | **15.5 GB** |
 | [phi-4.bf16.q8](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q8.gguf) | 8.00 bits per weight | 16.5 GB | **18.5 GB** |
 | [phi-4.bf16.q8_p](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q8_p.gguf) | 8.00 bits per weight | 15.6 GB | **18.6 GB** |
+| [phi-4.bf16](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.gguf) | 16.00 bits per weight | 29.3 |  |
 <sub>*approximate value at 16k context, FP16 cache.<sup>