phi-4_ZeroWw / README.md

Update README.md

4d0774a verified 9 months ago

4.24 kB

	---
	license: mit
	language:
	- en
	base_model:
	- unsloth/phi-4
	- microsoft/phi-4
	pipeline_tag: text-generation
	---

	# Phi-4 ZeroWw quantizations

	- For q4_k: output and embed tensors quantized to q8_0, all other tensors quantized for q4_k.
	- For q5_k, q6_k, q8_0 and q8_0 --pure: output and embed tensors quantized to bf16, all other tensors quantized for q5_k, q6_k, q8_0 and q8_0 --pure.
	- BF16 and imatrix for q5_k, q6_k available.

	\| \| Quant type \| File Size \| Vram*\|
	\| -------- \| ---------- \| --------- \| -------- \|
	\| [phi-4.q8.q4](https://huggingface.co/cmh/test/blob/main/phi-4.q8.q4.gguf) \| 4 bits per weight \| 9.43 GB \| 12.9 GB \|
	\| [phi-4.bf16.q5](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q5.gguf) \| 5 bits per weight \| 11.9 GB \| 14.2 GB \|
	\| [phi-4.bf16.q5.im](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q5.im.gguf) \| 5 bits per weight \| 11.9 GB \| 14.2 GB \|
	\| [phi-4.bf16.q6](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q6.gguf) \| 6 bits per weight \| 13.2 GB \| 15.5 GB \|
	\| [phi-4.bf16.q6.im](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q6.im.gguf) \| 6 bits per weight \| 13.2 GB \| 15.5 GB \|
	\| [phi-4.bf16.q8](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q8.gguf) \| 8 bits per weight \| 16.5 GB \| 18.5 GB \|
	\| [phi-4.bf16.q8p](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.q8p.gguf) \| 8 bits per weight \| 15.6 GB \| 18.6 GB \|
	\| [phi-4.bf16](https://huggingface.co/cmh/test/blob/main/phi-4.bf16.gguf) \| 16 bits per weight \| 29.3 GB \| tbd \|

	<sub>approximate value at 16k context, FP16 cache*.<sup>

	---------------------------------------------

	[ZeroWw quantization: huggingface.co/RobertSinclair](https://huggingface.co/RobertSinclair)


	```
	python convert_hf_to_gguf.py --outtype bf16 phi-4 --outfile phi-4.bf16.gguf

	llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 phi-4.bf16.gguf phi-4.q8.q4.gguf q4_k
	llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q5.gguf q5_k
	llama-quantize --imatrix imatrix.dat --leave-output-tensor phi-4.bf16.gguf phi-4.bf16.q5.im.gguf q5_k
	llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q6.gguf q6_k
	llama-quantize --imatrix imatrix.dat --leave-output-tensor phi-4.bf16.gguf phi-4.bf16.q6.im.gguf q6_k
	llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q8.gguf q8_0
	llama-quantize --allow-requantize --pure phi-4.bf16.gguf phi-4.bf16.q8p.gguf q8_0
	```

	---------------------------------------------

	# Phi-4 Model Card

	[Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)

	## Model Summary

	\| \| \|
	\|-------------------------\|-------------------------------------------------------------------------------\|
	\| Developers \| Microsoft Research \|
	\| Description \| `phi-4` is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.<br><br>`phi-4` underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures \|
	\| Architecture \| 14B parameters, dense decoder-only Transformer model \|
	\| Context length \| 16384 tokens \|

	## Usage

	### Input Formats

	Given the nature of the training data, `phi-4` is best suited for prompts using the chat format as follows:

	```bash
	<\|im_start\|>system<\|im_sep\|>
	You are a medieval knight and must provide explanations to modern people.<\|im_end\|>
	<\|im_start\|>user<\|im_sep\|>
	How should I explain the Internet?<\|im_end\|>
	<\|im_start\|>assistant<\|im_sep\|>
	```