Lin-K76 commited on
Commit
5458671
·
verified ·
1 Parent(s): e468910

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -3
README.md CHANGED
@@ -13,12 +13,10 @@ tags:
13
  * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
14
 
15
  Qwen2-0.5B-Instruct quantized to FP8 weights and activations using per-tensor quantization through the AutoFP8 repository, ready for inference with vLLM >= 0.5.0.
16
- Calibrated with 512 UltraChat samples to achieve 99% performance recovery on the Open LLM Benchmark evaluations.
17
  Reduces space on disk by ~30%.
18
  Part of the FP8 LLMs for vLLM collection.
19
 
20
-
21
-
22
  ## Usage and Creation
23
  Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
24
 
 
13
  * <h3 style="display: inline;">Model Developers:</h3> Neural Magic
14
 
15
  Qwen2-0.5B-Instruct quantized to FP8 weights and activations using per-tensor quantization through the AutoFP8 repository, ready for inference with vLLM >= 0.5.0.
16
+ Calibrated with 512 UltraChat samples to achieve 100% performance recovery on the Open LLM Benchmark evaluations.
17
  Reduces space on disk by ~30%.
18
  Part of the FP8 LLMs for vLLM collection.
19
 
 
 
20
  ## Usage and Creation
21
  Produced using [AutoFP8 with calibration samples from ultrachat](https://github.com/neuralmagic/AutoFP8/blob/147fa4d9e1a90ef8a93f96fc7d9c33056ddc017a/example_dataset.py).
22