Extreme Compression of Large Language Models via Additive Quantization
Paper • 2401.06118 • Published • 14
YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Official AQLM quantization of meta-llama/Llama-2-13b-hf.
For this quantization, we used 2 codebook of 8 bits.
Selected evaluation results for this and other models:
| Model | Quantization | WikiText 2 PPL | Model size, Gb | |
|---|---|---|---|---|
| Llama-2-13b | - | 4.57 | 26.0 | |
| 2x8 | 5.63 | 3.8 |