Update README.md
Browse files
README.md
CHANGED
|
@@ -33,7 +33,7 @@ base_model: meta-llama/Meta-Llama-3.1-8B-Instruct
|
|
| 33 |
- **Model Developers:** Neural Magic
|
| 34 |
|
| 35 |
Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
|
| 36 |
-
It achieves scores within 1
|
| 37 |
|
| 38 |
### Model Optimizations
|
| 39 |
|
|
@@ -135,6 +135,8 @@ The model was evaluated on MMLU, ARC-Challenge, GSM-8K, Hellaswag, Winogrande an
|
|
| 135 |
Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
|
| 136 |
This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
|
| 137 |
|
|
|
|
|
|
|
| 138 |
### Accuracy
|
| 139 |
|
| 140 |
#### Open LLM Leaderboard evaluation scores
|
|
@@ -239,7 +241,7 @@ The results were obtained using the following commands:
|
|
| 239 |
```
|
| 240 |
lm_eval \
|
| 241 |
--model vllm \
|
| 242 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,
|
| 243 |
--tasks mmlu_llama_3.1_instruct \
|
| 244 |
--fewshot_as_multiturn \
|
| 245 |
--apply_chat_template \
|
|
@@ -251,7 +253,7 @@ lm_eval \
|
|
| 251 |
```
|
| 252 |
lm_eval \
|
| 253 |
--model vllm \
|
| 254 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,
|
| 255 |
--tasks mmlu_cot_0shot_llama_3.1_instruct \
|
| 256 |
--apply_chat_template \
|
| 257 |
--num_fewshot 0 \
|
|
@@ -262,7 +264,7 @@ lm_eval \
|
|
| 262 |
```
|
| 263 |
lm_eval \
|
| 264 |
--model vllm \
|
| 265 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,
|
| 266 |
--tasks arc_challenge_llama_3.1_instruct \
|
| 267 |
--apply_chat_template \
|
| 268 |
--num_fewshot 0 \
|
|
@@ -273,7 +275,7 @@ lm_eval \
|
|
| 273 |
```
|
| 274 |
lm_eval \
|
| 275 |
--model vllm \
|
| 276 |
-
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,
|
| 277 |
--tasks gsm8k_cot_llama_3.1_instruct \
|
| 278 |
--fewshot_as_multiturn \
|
| 279 |
--apply_chat_template \
|
|
|
|
| 33 |
- **Model Developers:** Neural Magic
|
| 34 |
|
| 35 |
Quantized version of [Meta-Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct).
|
| 36 |
+
It achieves scores within 1% of the scores of the unquantized model for MMLU, ARC-Challenge, GSM-8k, Hellaswag, Winogrande and TruthfulQA.
|
| 37 |
|
| 38 |
### Model Optimizations
|
| 39 |
|
|
|
|
| 135 |
Evaluation was conducted using the Neural Magic fork of [lm-evaluation-harness](https://github.com/neuralmagic/lm-evaluation-harness/tree/llama_3.1_instruct) (branch llama_3.1_instruct) and the [vLLM](https://docs.vllm.ai/en/stable/) engine.
|
| 136 |
This version of the lm-evaluation-harness includes versions of MMLU, ARC-Challenge and GSM-8K that match the prompting style of [Meta-Llama-3.1-Instruct-evals](https://huggingface.co/datasets/meta-llama/Meta-Llama-3.1-8B-Instruct-evals).
|
| 137 |
|
| 138 |
+
**Note:** Results have been updated after Meta modified the chat template.
|
| 139 |
+
|
| 140 |
### Accuracy
|
| 141 |
|
| 142 |
#### Open LLM Leaderboard evaluation scores
|
|
|
|
| 241 |
```
|
| 242 |
lm_eval \
|
| 243 |
--model vllm \
|
| 244 |
+
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,max_model_len=3850,max_gen_toks=10,tensor_parallel_size=1 \
|
| 245 |
--tasks mmlu_llama_3.1_instruct \
|
| 246 |
--fewshot_as_multiturn \
|
| 247 |
--apply_chat_template \
|
|
|
|
| 253 |
```
|
| 254 |
lm_eval \
|
| 255 |
--model vllm \
|
| 256 |
+
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,max_model_len=4064,max_gen_toks=1024,tensor_parallel_size=1 \
|
| 257 |
--tasks mmlu_cot_0shot_llama_3.1_instruct \
|
| 258 |
--apply_chat_template \
|
| 259 |
--num_fewshot 0 \
|
|
|
|
| 264 |
```
|
| 265 |
lm_eval \
|
| 266 |
--model vllm \
|
| 267 |
+
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,max_model_len=3940,max_gen_toks=100,tensor_parallel_size=1 \
|
| 268 |
--tasks arc_challenge_llama_3.1_instruct \
|
| 269 |
--apply_chat_template \
|
| 270 |
--num_fewshot 0 \
|
|
|
|
| 275 |
```
|
| 276 |
lm_eval \
|
| 277 |
--model vllm \
|
| 278 |
+
--model_args pretrained="neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w8a8",dtype=auto,max_model_len=4096,max_gen_toks=1024,tensor_parallel_size=1 \
|
| 279 |
--tasks gsm8k_cot_llama_3.1_instruct \
|
| 280 |
--fewshot_as_multiturn \
|
| 281 |
--apply_chat_template \
|