Update README.md
Browse files
README.md
CHANGED
|
@@ -13,12 +13,10 @@ tags:
|
|
| 13 |
* <h3 style="display: inline;">Model Developers:</h3> Neural Magic
|
| 14 |
|
| 15 |
Qwen2-0.5B-Instruct quantized to FP8 weights and activations using per-tensor quantization through the AutoFP8 repository, ready for inference with vLLM >= 0.5.0.
|
| 16 |
-
Calibrated with 512 UltraChat samples to achieve
|
| 17 |
Reduces space on disk by ~30%.
|
| 18 |
Part of the FP8 LLMs for vLLM collection.
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
| 22 |
## Usage and Creation
|
| 23 |
Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
|
| 24 |
|
|
|
|
| 13 |
* <h3 style="display: inline;">Model Developers:</h3> Neural Magic
|
| 14 |
|
| 15 |
Qwen2-0.5B-Instruct quantized to FP8 weights and activations using per-tensor quantization through the AutoFP8 repository, ready for inference with vLLM >= 0.5.0.
|
| 16 |
+
Calibrated with 512 UltraChat samples to achieve 100% performance recovery on the Open LLM Benchmark evaluations.
|
| 17 |
Reduces space on disk by ~30%.
|
| 18 |
Part of the FP8 LLMs for vLLM collection.
|
| 19 |
|
|
|
|
|
|
|
| 20 |
## Usage and Creation
|
| 21 |
Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
|
| 22 |
|