cmh
/

phi-4_ZeroWw

@@ -13,17 +13,6 @@ pipeline_tag: text-generation
 - [Output and embed tensors quantized to q8_0, all other tensors quantized for q4_k.](https://huggingface.co/RobertSinclair)
 - [Output and embed tensors quantized to bf16, all other tensors quantized for q5_k, q6_k, q8_0 and q8_0 --pure.](https://huggingface.co/RobertSinclair)
 - BF16 and imatrix q5_k, q6_k available.
-```
-python convert_hf_to_gguf.py --outtype bf16 phi-4 --outfile phi-4.bf16.gguf
-llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 phi-4.bf16.gguf phi-4.q8.q4.gguf q4_k
-llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q5.gguf q5_k
-llama-quantize --imatrix imatrix.dat --leave-output-tensor phi-4.bf16.gguf phi-4.bf16.q5.im.gguf q5_k
-llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q6.gguf q6_k
-llama-quantize --imatrix imatrix.dat --leave-output-tensor phi-4.bf16.gguf phi-4.bf16.q6.im.gguf q6_k
-llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q8.gguf q8_0
-llama-quantize --allow-requantize --pure phi-4.bf16.gguf phi-4.bf16.q8p.gguf q8_0'
-```
 |  | Quant type | File Size | ~Vram*|
 | -------- | ---------- | --------- | -------- |
@@ -38,6 +27,19 @@ llama-quantize --allow-requantize --pure phi-4.bf16.gguf phi-4.bf16.q8p.gguf q8_
 <sub>*approximate value at **16k context, FP16 cache**.<sup>
 ---------------------------------------------
 # Phi-4 Model Card

 - [Output and embed tensors quantized to q8_0, all other tensors quantized for q4_k.](https://huggingface.co/RobertSinclair)
 - [Output and embed tensors quantized to bf16, all other tensors quantized for q5_k, q6_k, q8_0 and q8_0 --pure.](https://huggingface.co/RobertSinclair)
 - BF16 and imatrix q5_k, q6_k available.
 |  | Quant type | File Size | ~Vram*|
 | -------- | ---------- | --------- | -------- |
 <sub>*approximate value at **16k context, FP16 cache**.<sup>
+```
+python convert_hf_to_gguf.py --outtype bf16 phi-4 --outfile phi-4.bf16.gguf
+llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 phi-4.bf16.gguf phi-4.q8.q4.gguf q4_k
+llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q5.gguf q5_k
+llama-quantize --imatrix imatrix.dat --leave-output-tensor phi-4.bf16.gguf phi-4.bf16.q5.im.gguf q5_k
+llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q6.gguf q6_k
+llama-quantize --imatrix imatrix.dat --leave-output-tensor phi-4.bf16.gguf phi-4.bf16.q6.im.gguf q6_k
+llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q8.gguf q8_0
+llama-quantize --allow-requantize --pure phi-4.bf16.gguf phi-4.bf16.q8p.gguf q8_0'
+```
 ---------------------------------------------
 # Phi-4 Model Card