Update README.md
Browse files
README.md
CHANGED
|
@@ -2,8 +2,33 @@
|
|
| 2 |
tags:
|
| 3 |
- fp8
|
| 4 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
Produced using https://github.com/neuralmagic/AutoFP8/blob/b0c1f789c51659bb023c06521ecbd04cea4a26f6/quantize.py
|
| 6 |
|
| 7 |
```bash
|
| 8 |
python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
```
|
|
|
|
| 2 |
tags:
|
| 3 |
- fp8
|
| 4 |
---
|
| 5 |
+
|
| 6 |
+
|
| 7 |
+
Meta-Llama-3-8B-Instruct quantized to FP8 weights and activations using per-tensor quantization, ready for inference with vLLM >= 0.4.2.
|
| 8 |
+
|
| 9 |
Produced using https://github.com/neuralmagic/AutoFP8/blob/b0c1f789c51659bb023c06521ecbd04cea4a26f6/quantize.py
|
| 10 |
|
| 11 |
```bash
|
| 12 |
python quantize.py --model-id meta-llama/Meta-Llama-3-8B-Instruct --save-dir Meta-Llama-3-8B-Instruct-FP8
|
| 13 |
+
```
|
| 14 |
+
|
| 15 |
+
Accuracy on MMLU:
|
| 16 |
+
```
|
| 17 |
+
vllm (pretrained=meta-llama/Meta-Llama-3-8B-Instruct,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
|
| 18 |
+
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
| 19 |
+
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
| 20 |
+
|mmlu |N/A |none | 0|acc |0.6569|± |0.0038|
|
| 21 |
+
| - humanities |N/A |none | 5|acc |0.6049|± |0.0068|
|
| 22 |
+
| - other |N/A |none | 5|acc |0.7203|± |0.0078|
|
| 23 |
+
| - social_sciences|N/A |none | 5|acc |0.7663|± |0.0075|
|
| 24 |
+
| - stem |N/A |none | 5|acc |0.5652|± |0.0085|
|
| 25 |
+
|
| 26 |
+
vllm (pretrained=nm-testing/Meta-Llama-3-8B-Instruct-FP8,quantization=fp8,gpu_memory_utilization=0.4), gen_kwargs: (None), limit: None, num_fewshot: 5, batch_size: 16
|
| 27 |
+
| Groups |Version|Filter|n-shot|Metric|Value | |Stderr|
|
| 28 |
+
|------------------|-------|------|-----:|------|-----:|---|-----:|
|
| 29 |
+
|mmlu |N/A |none | 0|acc |0.6567|± |0.0038|
|
| 30 |
+
| - humanities |N/A |none | 5|acc |0.6072|± |0.0068|
|
| 31 |
+
| - other |N/A |none | 5|acc |0.7206|± |0.0078|
|
| 32 |
+
| - social_sciences|N/A |none | 5|acc |0.7618|± |0.0075|
|
| 33 |
+
| - stem |N/A |none | 5|acc |0.5649|± |0.0085|
|
| 34 |
```
|