Update README.md
Browse files
README.md
CHANGED
|
@@ -1,6 +1,13 @@
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
pipeline_tag: text-generation
|
| 5 |
license: llama3.1
|
| 6 |
---
|
|
@@ -14,15 +21,15 @@ license: llama3.1
|
|
| 14 |
- **Model Optimizations:**
|
| 15 |
- **Activation quantization:** INT8
|
| 16 |
- **Weight quantization:** INT8
|
| 17 |
-
- **Intended Use Cases:** Intended for commercial and research use
|
| 18 |
-
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
|
| 19 |
- **Release Date:** 7/11/2024
|
| 20 |
- **Version:** 1.0
|
| 21 |
-
- **License(s):** [Llama3]
|
| 22 |
- **Model Developers:** Neural Magic
|
| 23 |
|
| 24 |
Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
|
| 25 |
-
It achieves
|
| 26 |
|
| 27 |
### Model Optimizations
|
| 28 |
|
|
@@ -120,14 +127,9 @@ model.save_pretrained("Meta-Llama-3.1-8B-Instruct-quantized.w8a8")
|
|
| 120 |
|
| 121 |
## Evaluation
|
| 122 |
|
| 123 |
-
The model was evaluated on
|
| 124 |
-
|
| 125 |
-
|
| 126 |
-
--model vllm \
|
| 127 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,gpu_memory_utilization=0.4,add_bos_token=True,max_model_len=4096,tensor_parallel_size=1 \
|
| 128 |
-
--tasks openllm \
|
| 129 |
-
--batch_size auto
|
| 130 |
-
```
|
| 131 |
|
| 132 |
### Accuracy
|
| 133 |
|
|
@@ -156,21 +158,21 @@ lm_eval \
|
|
| 156 |
<tr>
|
| 157 |
<td>ARC Challenge (25-shot)
|
| 158 |
</td>
|
| 159 |
-
<td>
|
| 160 |
</td>
|
| 161 |
-
<td>
|
| 162 |
</td>
|
| 163 |
-
<td>
|
| 164 |
</td>
|
| 165 |
</tr>
|
| 166 |
<tr>
|
| 167 |
<td>GSM-8K (5-shot, strict-match)
|
| 168 |
</td>
|
| 169 |
-
<td>
|
| 170 |
</td>
|
| 171 |
-
<td>
|
| 172 |
</td>
|
| 173 |
-
<td>
|
| 174 |
</td>
|
| 175 |
</tr>
|
| 176 |
<tr>
|
|
@@ -206,11 +208,11 @@ lm_eval \
|
|
| 206 |
<tr>
|
| 207 |
<td><strong>Average</strong>
|
| 208 |
</td>
|
| 209 |
-
<td><strong>
|
| 210 |
</td>
|
| 211 |
-
<td><strong>
|
| 212 |
</td>
|
| 213 |
-
<td><strong>99.
|
| 214 |
</td>
|
| 215 |
</tr>
|
| 216 |
</table>
|
|
|
|
| 1 |
---
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
+
- de
|
| 5 |
+
- fr
|
| 6 |
+
- it
|
| 7 |
+
- pt
|
| 8 |
+
- hi
|
| 9 |
+
- es
|
| 10 |
+
- th
|
| 11 |
pipeline_tag: text-generation
|
| 12 |
license: llama3.1
|
| 13 |
---
|
|
|
|
| 21 |
- **Model Optimizations:**
|
| 22 |
- **Activation quantization:** INT8
|
| 23 |
- **Weight quantization:** INT8
|
| 24 |
+
- **Intended Use Cases:** Intended for commercial and research use multiple languages. Similarly to [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct), this models is intended for assistant-like chat.
|
| 25 |
+
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws).
|
| 26 |
- **Release Date:** 7/11/2024
|
| 27 |
- **Version:** 1.0
|
| 28 |
+
- **License(s):** [Llama3.1]
|
| 29 |
- **Model Developers:** Neural Magic
|
| 30 |
|
| 31 |
Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
|
| 32 |
+
It achieves scores within 1.3% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
|
| 33 |
|
| 34 |
### Model Optimizations
|
| 35 |
|
|
|
|
| 127 |
|
| 128 |
## Evaluation
|
| 129 |
|
| 130 |
+
The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande and TruthfulQA.
|
| 131 |
+
Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
|
| 132 |
+
This version of the lm-evaluation-harness includes versions of ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 133 |
|
| 134 |
### Accuracy
|
| 135 |
|
|
|
|
| 158 |
<tr>
|
| 159 |
<td>ARC Challenge (25-shot)
|
| 160 |
</td>
|
| 161 |
+
<td>83.19
|
| 162 |
</td>
|
| 163 |
+
<td>82.08
|
| 164 |
</td>
|
| 165 |
+
<td>98.7%
|
| 166 |
</td>
|
| 167 |
</tr>
|
| 168 |
<tr>
|
| 169 |
<td>GSM-8K (5-shot, strict-match)
|
| 170 |
</td>
|
| 171 |
+
<td>82.79
|
| 172 |
</td>
|
| 173 |
+
<td>81.96
|
| 174 |
</td>
|
| 175 |
+
<td>99.0%
|
| 176 |
</td>
|
| 177 |
</tr>
|
| 178 |
<tr>
|
|
|
|
| 208 |
<tr>
|
| 209 |
<td><strong>Average</strong>
|
| 210 |
</td>
|
| 211 |
+
<td><strong>74.31</strong>
|
| 212 |
</td>
|
| 213 |
+
<td><strong>73.79</strong>
|
| 214 |
</td>
|
| 215 |
+
<td><strong>99.3%</strong>
|
| 216 |
</td>
|
| 217 |
</tr>
|
| 218 |
</table>
|