Improve model card: Add Tequila paper info, metadata, and citation
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -3,8 +3,23 @@ tags:
|
|
| 3 |
- hunyuan
|
| 4 |
- eagle3
|
| 5 |
- eagle
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
<p align="center">
|
| 9 |
<picture>
|
| 10 |
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo_light.png?raw=true">
|
|
@@ -30,7 +45,7 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
|
|
| 30 |
- [How to Use](#how-to-use)
|
| 31 |
- [Install AngelSlim](#install-angelslim)
|
| 32 |
- [Quick Start](#quick-start)
|
| 33 |
-
- [
|
| 34 |
- [Benchmark](#benchmark)
|
| 35 |
- [License](#license)
|
| 36 |
- [Citation](#citation)
|
|
@@ -245,30 +260,30 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
|
|
| 245 |
<tr><th>Model</th><th>Quantization</th><th>CEVAL</th><th>MMLU</th><th>GSM8K</th></tr>
|
| 246 |
</thead>
|
| 247 |
<tbody>
|
| 248 |
-
<tr><td rowspan
|
| 249 |
<tr><td>FP8-Static</td><td>66.27</td><td>60.23</td><td>-</td></tr>
|
| 250 |
<tr><td>FP8-Dynamic</td><td>66.79</td><td>60.08</td><td>51.71</td></tr>
|
| 251 |
-
<tr><td rowspan
|
| 252 |
<tr><td>FP8-Static</td><td>81.13</td><td>74.03</td><td>79.30</td></tr>
|
| 253 |
<tr><td>FP8-Dynamic</td><td>80.31</td><td>74.07</td><td>79.00</td></tr>
|
| 254 |
<tr><td>INT4-GPTQ</td><td>79.05</td><td>73.05</td><td>74.75</td></tr>
|
| 255 |
<tr><td>INT4-AWQ</td><td>79.35</td><td>73.22</td><td>79.38</td></tr>
|
| 256 |
-
<tr><td rowspan
|
| 257 |
<tr><td>FP8-Static</td><td>87.59</td><td>83.08</td><td>81.58</td></tr>
|
| 258 |
<tr><td>FP8-Dynamic</td><td>87.30</td><td>83.04</td><td>81.58</td></tr>
|
| 259 |
<tr><td>INT4-GPTQ</td><td>86.70</td><td>82.45</td><td>82.03</td></tr>
|
| 260 |
<tr><td>INT4-AWQ</td><td>87.00</td><td>82.64</td><td>-</td></tr>
|
| 261 |
-
<tr><td rowspan
|
| 262 |
<tr><td>FP8-Static</td><td>53.57</td><td>54.17</td><td>76.19</td></tr>
|
| 263 |
<tr><td>FP8-Dynamic</td><td>52.97</td><td>54.13</td><td>74.15</td></tr>
|
| 264 |
<tr><td>INT4-GPTQ</td><td>51.86</td><td>52.44</td><td>75.89</td></tr>
|
| 265 |
<tr><td>INT4-AWQ</td><td>53.49</td><td>53.70</td><td>-</td></tr>
|
| 266 |
-
<tr><td rowspan
|
| 267 |
<tr><td>FP8-Static</td><td>77.56</td><td>74.66</td><td>86.73</td></tr>
|
| 268 |
<tr><td>FP8-Dynamic</td><td>76.82</td><td>74.63</td><td>87.11</td></tr>
|
| 269 |
<tr><td>INT4-GPTQ</td><td>74.29</td><td>72.37</td><td>84.61</td></tr>
|
| 270 |
<tr><td>INT4-AWQ</td><td>74.81</td><td>73.00</td><td>86.05</td></tr>
|
| 271 |
-
<tr><td rowspan
|
| 272 |
<tr><td>FP8-Static</td><td>83.43</td><td>80.90</td><td>87.57</td></tr>
|
| 273 |
<tr><td>FP8-Dynamic</td><td>83.73</td><td>81.10</td><td>86.43</td></tr>
|
| 274 |
<tr><td>INT4-GPTQ</td><td>84.10</td><td>79.80</td><td>86.73</td></tr>
|
|
@@ -277,7 +292,6 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
|
|
| 277 |
</table>
|
| 278 |
|
| 279 |
### (2) Speculative Decoding
|
| 280 |
-
|
| 281 |
#### Qwen3 Series Models
|
| 282 |
Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
|
| 283 |
|
|
@@ -346,16 +360,28 @@ The code for this project is open-sourced under the [License for AngelSlim](LICE
|
|
| 346 |
|
| 347 |
## ๐ Citation
|
| 348 |
|
|
|
|
| 349 |
```
|
| 350 |
@software{AngelSlim2025,
|
| 351 |
title={{AngelSlim}},
|
| 352 |
author={Tencent AngelSlim Project Contributors},
|
| 353 |
year={2025},
|
| 354 |
-
month={
|
| 355 |
url={https://github.com/Tencent/AngelSlim},
|
| 356 |
}
|
| 357 |
```
|
| 358 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 359 |
## ๐ฌ Technical Discussion
|
| 360 |
|
| 361 |
* AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue on GitHub or join our [WeChat technical discussion group](https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/angel_slim_wechat.png?raw=true).
|
|
|
|
| 3 |
- hunyuan
|
| 4 |
- eagle3
|
| 5 |
- eagle
|
| 6 |
+
- quantization
|
| 7 |
+
- ternary-quantization
|
| 8 |
+
- tequila
|
| 9 |
+
pipeline_tag: text-generation
|
| 10 |
+
library_name: transformers
|
| 11 |
+
license: apache-2.0
|
| 12 |
---
|
| 13 |
|
| 14 |
+
# Tequila: Trapping-free Ternary Quantization for Large Language Models
|
| 15 |
+
|
| 16 |
+
This repository provides models and/or implementations related to the **Tequila** method, a novel trapping-free ternary quantization technique for Large Language Models, as introduced in the paper:
|
| 17 |
+
[**Tequila: Trapping-free Ternary Quantization for Large Language Models**](https://huggingface.co/papers/2509.23809)
|
| 18 |
+
|
| 19 |
+
Tequila is implemented as part of the broader **AngelSlim** compression toolkit, which aims to provide intuitive, comprehensive, and efficient tools for LLM compression.
|
| 20 |
+
|
| 21 |
+
For the Tequila specific implementation code, please refer to: [https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)
|
| 22 |
+
|
| 23 |
<p align="center">
|
| 24 |
<picture>
|
| 25 |
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo_light.png?raw=true">
|
|
|
|
| 45 |
- [How to Use](#how-to-use)
|
| 46 |
- [Install AngelSlim](#install-angelslim)
|
| 47 |
- [Quick Start](#quick-start)
|
| 48 |
+
- [Deployment & Evaluation](#deployment)
|
| 49 |
- [Benchmark](#benchmark)
|
| 50 |
- [License](#license)
|
| 51 |
- [Citation](#citation)
|
|
|
|
| 260 |
<tr><th>Model</th><th>Quantization</th><th>CEVAL</th><th>MMLU</th><th>GSM8K</th></tr>
|
| 261 |
</thead>
|
| 262 |
<tbody>
|
| 263 |
+
<tr><td rowspan=\"3\">Qwen2.5-1.5B-Instruct</td><td>BF16</td><td>67.01</td><td>60.05</td><td>54.28</td></tr>
|
| 264 |
<tr><td>FP8-Static</td><td>66.27</td><td>60.23</td><td>-</td></tr>
|
| 265 |
<tr><td>FP8-Dynamic</td><td>66.79</td><td>60.08</td><td>51.71</td></tr>
|
| 266 |
+
<tr><td rowspan=\"5\">Qwen2.5-7B-Instruct</td><td>BF16</td><td>81.20</td><td>74.55</td><td>79.98</td></tr>
|
| 267 |
<tr><td>FP8-Static</td><td>81.13</td><td>74.03</td><td>79.30</td></tr>
|
| 268 |
<tr><td>FP8-Dynamic</td><td>80.31</td><td>74.07</td><td>79.00</td></tr>
|
| 269 |
<tr><td>INT4-GPTQ</td><td>79.05</td><td>73.05</td><td>74.75</td></tr>
|
| 270 |
<tr><td>INT4-AWQ</td><td>79.35</td><td>73.22</td><td>79.38</td></tr>
|
| 271 |
+
<tr><td rowspan=\"5\">Qwen2.5-32B-Instruct</td><td>BF16</td><td>87.30</td><td>83.21</td><td>81.73</td></tr>
|
| 272 |
<tr><td>FP8-Static</td><td>87.59</td><td>83.08</td><td>81.58</td></tr>
|
| 273 |
<tr><td>FP8-Dynamic</td><td>87.30</td><td>83.04</td><td>81.58</td></tr>
|
| 274 |
<tr><td>INT4-GPTQ</td><td>86.70</td><td>82.45</td><td>82.03</td></tr>
|
| 275 |
<tr><td>INT4-AWQ</td><td>87.00</td><td>82.64</td><td>-</td></tr>
|
| 276 |
+
<tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-7B</td><td>BF16</td><td>53.49</td><td>53.80</td><td>75.74</td></tr>
|
| 277 |
<tr><td>FP8-Static</td><td>53.57</td><td>54.17</td><td>76.19</td></tr>
|
| 278 |
<tr><td>FP8-Dynamic</td><td>52.97</td><td>54.13</td><td>74.15</td></tr>
|
| 279 |
<tr><td>INT4-GPTQ</td><td>51.86</td><td>52.44</td><td>75.89</td></tr>
|
| 280 |
<tr><td>INT4-AWQ</td><td>53.49</td><td>53.70</td><td>-</td></tr>
|
| 281 |
+
<tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-14B</td><td>BF16</td><td>77.71</td><td>74.28</td><td>85.67</td></tr>
|
| 282 |
<tr><td>FP8-Static</td><td>77.56</td><td>74.66</td><td>86.73</td></tr>
|
| 283 |
<tr><td>FP8-Dynamic</td><td>76.82</td><td>74.63</td><td>87.11</td></tr>
|
| 284 |
<tr><td>INT4-GPTQ</td><td>74.29</td><td>72.37</td><td>84.61</td></tr>
|
| 285 |
<tr><td>INT4-AWQ</td><td>74.81</td><td>73.00</td><td>86.05</td></tr>
|
| 286 |
+
<tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-32B</td><td>BF16</td><td>84.18</td><td>80.89</td><td>87.41</td></tr>
|
| 287 |
<tr><td>FP8-Static</td><td>83.43</td><td>80.90</td><td>87.57</td></tr>
|
| 288 |
<tr><td>FP8-Dynamic</td><td>83.73</td><td>81.10</td><td>86.43</td></tr>
|
| 289 |
<tr><td>INT4-GPTQ</td><td>84.10</td><td>79.80</td><td>86.73</td></tr>
|
|
|
|
| 292 |
</table>
|
| 293 |
|
| 294 |
### (2) Speculative Decoding
|
|
|
|
| 295 |
#### Qwen3 Series Models
|
| 296 |
Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
|
| 297 |
|
|
|
|
| 360 |
|
| 361 |
## ๐ Citation
|
| 362 |
|
| 363 |
+
If you use AngelSlim, please cite it as:
|
| 364 |
```
|
| 365 |
@software{AngelSlim2025,
|
| 366 |
title={{AngelSlim}},
|
| 367 |
author={Tencent AngelSlim Project Contributors},
|
| 368 |
year={2025},
|
| 369 |
+
month={7},
|
| 370 |
url={https://github.com/Tencent/AngelSlim},
|
| 371 |
}
|
| 372 |
```
|
| 373 |
|
| 374 |
+
If you use the Tequila quantization method, please also cite its corresponding paper:
|
| 375 |
+
```bibtex
|
| 376 |
+
@article{li2025tequila,
|
| 377 |
+
title={{Tequila: Trapping-free Ternary Quantization for Large Language Models}},
|
| 378 |
+
author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
|
| 379 |
+
journal={arXiv preprint arXiv:2509.23809},
|
| 380 |
+
year={2025},
|
| 381 |
+
url={https://arxiv.org/abs/2509.23809}
|
| 382 |
+
}
|
| 383 |
+
```
|
| 384 |
+
|
| 385 |
## ๐ฌ Technical Discussion
|
| 386 |
|
| 387 |
* AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue on GitHub or join our [WeChat technical discussion group](https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/angel_slim_wechat.png?raw=true).
|