Update README.md
Browse files
README.md
CHANGED
|
@@ -11,6 +11,8 @@ There are some issues with the model weights in terms of precision. In the next
|
|
| 11 |
**Please note:** Do not use "accelerated inference frameworks" like **VLLM** temporarily. Instead, use Transformers for inference. Otherwise, due to precision issues, the output quality will be significantly degraded. If you need faster inference, you can consider using the q8_0 quantization (faster and better than bf16 vllm for this model only) with llama.cpp temporarily or wait for the official version.
|
| 12 |
Some minor details will be fixed in the upcoming next version update.
|
| 13 |
|
|
|
|
|
|
|
| 14 |
## MT-Bench: 8.5
|
| 15 |
|
| 16 |

|
|
|
|
| 11 |
**Please note:** Do not use "accelerated inference frameworks" like **VLLM** temporarily. Instead, use Transformers for inference. Otherwise, due to precision issues, the output quality will be significantly degraded. If you need faster inference, you can consider using the q8_0 quantization (faster and better than bf16 vllm for this model only) with llama.cpp temporarily or wait for the official version.
|
| 12 |
Some minor details will be fixed in the upcoming next version update.
|
| 13 |
|
| 14 |
+
Please do not use wikitext for quantization calibration because all wikitext has been realigned on synthetic dataset, and its distribution differs significantly from the original wikitext.
|
| 15 |
+
|
| 16 |
## MT-Bench: 8.5
|
| 17 |
|
| 18 |

|