Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ pipeline_tag: text-generation
|
|
| 12 |
|
| 13 |
[ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.](https://github.com/turboderp-org/exllamav2)
|
| 14 |
|
| 15 |
-
| Filename | Quant type | File Size | Vram
|
| 16 |
| -------- | ---------- | --------- | -------- |
|
| 17 |
| [phi-4_hb8_3bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_3bpw) | 3.00 bits per weight | 6.66 GB | **10,3 GB** |
|
| 18 |
| [phi-4_hb8_4bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_4bpw) | 4.00 bits per weight | 8.36 GB | **11,9 GB** |
|
|
@@ -21,6 +21,8 @@ pipeline_tag: text-generation
|
|
| 21 |
| [phi-4_hb8_7bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_7bpw) | 7.00 bits per weight | 13.5 GB | **16,7 GB** |
|
| 22 |
| [phi-4_hb8_8bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_8bpw) | 8.00 bits per weight | 15.2 GB | **18,2 GB** |
|
| 23 |
|
|
|
|
|
|
|
| 24 |
# Phi-4 Model Card
|
| 25 |
|
| 26 |
[Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)
|
|
|
|
| 12 |
|
| 13 |
[ExLlamaV2 is an inference library for running local LLMs on modern consumer GPUs.](https://github.com/turboderp-org/exllamav2)
|
| 14 |
|
| 15 |
+
| Filename | Quant type | File Size | Vram*|
|
| 16 |
| -------- | ---------- | --------- | -------- |
|
| 17 |
| [phi-4_hb8_3bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_3bpw) | 3.00 bits per weight | 6.66 GB | **10,3 GB** |
|
| 18 |
| [phi-4_hb8_4bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_4bpw) | 4.00 bits per weight | 8.36 GB | **11,9 GB** |
|
|
|
|
| 21 |
| [phi-4_hb8_7bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_7bpw) | 7.00 bits per weight | 13.5 GB | **16,7 GB** |
|
| 22 |
| [phi-4_hb8_8bpw](https://huggingface.co/cmh/phi-4_exl2/tree/hb8_8bpw) | 8.00 bits per weight | 15.2 GB | **18,2 GB** |
|
| 23 |
|
| 24 |
+
* at 16k context
|
| 25 |
+
|
| 26 |
# Phi-4 Model Card
|
| 27 |
|
| 28 |
[Phi-4 Technical Report](https://arxiv.org/pdf/2412.08905)
|