Update README.md
Browse files
README.md
CHANGED
|
@@ -1,45 +0,0 @@
|
|
| 1 |
-
---
|
| 2 |
-
language:
|
| 3 |
-
- en
|
| 4 |
-
pipeline_tag: text-generation
|
| 5 |
-
---
|
| 6 |
-
|
| 7 |
-
# Meta-Llama-3-70B-Instruct-quantized.w8a16
|
| 8 |
-
|
| 9 |
-
## Model Overview
|
| 10 |
-
- **Model Architecture:** Meta-Llama-3
|
| 11 |
-
- **Input:** Text
|
| 12 |
-
- **Output:** Text
|
| 13 |
-
- **Model Optimizations:**
|
| 14 |
-
- **Quantized:** INT8 weights
|
| 15 |
-
- **Release Date:** 7/2/2024
|
| 16 |
-
- **Version:** 1.0
|
| 17 |
-
- **Model Developers:** Neural Magic
|
| 18 |
-
|
| 19 |
-
Quantized version of [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).
|
| 20 |
-
It achieves an average score of 79.18% on the OpenLLM benchmark (version 1), whereas the unquantized model achieves 77.90%.
|
| 21 |
-
|
| 22 |
-
## Model Optimizations
|
| 23 |
-
|
| 24 |
-
This model was obtained by quantizing the weights of [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) to INT8 data type.
|
| 25 |
-
Only the weights of the linear operators within transformers blocks are quantized. Symmetric per-channel quantization is applied, in which a linear scaling per output dimension maps the INT8 and floating point representations of the quantized weights.
|
| 26 |
-
[AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ) is used for quantization.
|
| 27 |
-
This optimization reduces the number of bits per parameter from 16 to 8, reducing the disk size and GPU memory requirements by approximately 50%.
|
| 28 |
-
|
| 29 |
-
## Evaluation
|
| 30 |
-
|
| 31 |
-
The model was evaluated with the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) using the [vLLM](https://docs.vllm.ai/en/stable/) engine.
|
| 32 |
-
|
| 33 |
-
## Accuracy
|
| 34 |
-
|
| 35 |
-
### Open LLM Leaderboard evaluation scores
|
| 36 |
-
| | [Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct) | Meta-Llama-3-70B-Instruct-quantized.w8a16<br>(this model) |
|
| 37 |
-
| :------------------: | :----------------------: | :------------------------------------------------: |
|
| 38 |
-
| arc-c<br>25-shot | 72.44% | 71.59% |
|
| 39 |
-
| hellaswag<br>10-shot | 85.54% | 85.65% |
|
| 40 |
-
| mmlu<br>5-shot | 80.18% | 78.69% |
|
| 41 |
-
| truthfulqa<br>0-shot | 62.92% | 61.94% |
|
| 42 |
-
| winogrande<br>5-shot | 83.19% | 83.11% |
|
| 43 |
-
| gsm8k<br>5-shot | 90.83% | 86.43% |
|
| 44 |
-
| **Average<br>Accuracy** | **79.18%** | **77.90%** |
|
| 45 |
-
| **Recovery** | **100%** | **98.38%** |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|