cmh commited on
Commit
4cea630
·
verified ·
1 Parent(s): 27a2272

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +67 -0
README.md ADDED
@@ -0,0 +1,67 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - unsloth/phi-4
7
+ - microsoft/phi-4
8
+ pipeline_tag: text-generation
9
+ ---
10
+
11
+ # Phi-4 ZeroWw quantizations
12
+
13
+ - Output and embed tensors quantized to q8_0, all other tensors quantized for q4_k.
14
+ - Output and embed tensors quantized to bf16, all other tensors quantized for q5_k, q6_k, q8_0 and q8_0 --pure.
15
+
16
+ 'python convert_hf_to_gguf.py --outtype bf16 phi-4 --outfile phi-4.bf16.gguf'
17
+
18
+ 'llama-quantize --allow-requantize --output-tensor-type q8_0 --token-embedding-type q8_0 phi-4.bf16.gguf phi-4.q8.q4.gguf q4_k'
19
+ 'llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q5.gguf q5_k'
20
+ 'llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q6.gguf q6_k'
21
+ 'llama-quantize --allow-requantize --output-tensor-type bf16 --token-embedding-type bf16 phi-4.bf16.gguf phi-4.bf16.q8.gguf q8_0'
22
+ 'llama-quantize --allow-requantize --pure phi-4.bf16.gguf phi-4.bf16.q8_p.gguf q8_0'
23
+
24
+ | Filename | Quant type | File Size | ~Vram*|
25
+ | -------- | ---------- | --------- | -------- |
26
+ | [phi-4.q8.q4](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_4bpw) | 4.00 bits per weight | 8.36 GB | **11,9 GB** |
27
+ | [phi-4.bf16.q5](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_5bpw) | 5.00 bits per weight | 10.1 GB | **13,5 GB** |
28
+ | [phi-4.bf16.q6](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_6bpw) | 6.00 bits per weight | 11.8 GB | **15,1 GB** |
29
+ | [phi-4.bf16.q8_p](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_8bpw) | 8.00 bits per weight | 15.2 GB | **18,2 GB** |
30
+ | [phi-4.bf16.q8](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_8bpw) | 8.00 bits per weight | 15.2 GB | **18,2 GB** |
31
+
32
+ <sub>*approximate value at 16k context, FP16 cache.<sup>
33
+
34
+ ---------------------------------------------
35
+
36
+ # Phi-4 Model Card
37
+
38
+ [Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)
39
+
40
+ ## Model Summary
41
+
42
+ | | |
43
+ |-------------------------|-------------------------------------------------------------------------------|
44
+ | **Developers** | Microsoft Research |
45
+ | **Description** | `phi-4` is a state-of-the-art open model built upon a blend of synthetic datasets, data from filtered public domain websites, and acquired academic books and Q&A datasets. The goal of this approach was to ensure that small capable models were trained with data focused on high quality and advanced reasoning.<br><br>`phi-4` underwent a rigorous enhancement and alignment process, incorporating both supervised fine-tuning and direct preference optimization to ensure precise instruction adherence and robust safety measures |
46
+ | **Architecture** | 14B parameters, dense decoder-only Transformer model |
47
+ | **Context length** | 16384 tokens |
48
+
49
+ ## Usage
50
+
51
+ ### Input Formats
52
+
53
+ Given the nature of the training data, `phi-4` is best suited for prompts using the chat format as follows:
54
+
55
+ ```bash
56
+ <|im_start|>system<|im_sep|>
57
+ You are a medieval knight and must provide explanations to modern people.<|im_end|>
58
+ <|im_start|>user<|im_sep|>
59
+ How should I explain the Internet?<|im_end|>
60
+ <|im_start|>assistant<|im_sep|>
61
+ ```
62
+
63
+ ### With ExUI:
64
+
65
+ Add Phi-4 prompt format:
66
+
67
+ Edit/replace exui/backend/prompts.py with https://huggingface.co/cmh/phi-4_exl2/raw/main/backend/prompts.py