Improve model card: Add Tequila paper info, metadata, and citation

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +35 -9
README.md CHANGED
@@ -3,8 +3,23 @@ tags:
3
  - hunyuan
4
  - eagle3
5
  - eagle
 
 
 
 
 
 
6
  ---
7
 
 
 
 
 
 
 
 
 
 
8
  <p align="center">
9
  <picture>
10
  <source media="(prefers-color-scheme: dark)" srcset="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo_light.png?raw=true">
@@ -30,7 +45,7 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
30
  - [How to Use](#how-to-use)
31
  - [Install AngelSlim](#install-angelslim)
32
  - [Quick Start](#quick-start)
33
- - [deployment & Evaluation](#deployment)
34
  - [Benchmark](#benchmark)
35
  - [License](#license)
36
  - [Citation](#citation)
@@ -245,30 +260,30 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
245
  <tr><th>Model</th><th>Quantization</th><th>CEVAL</th><th>MMLU</th><th>GSM8K</th></tr>
246
  </thead>
247
  <tbody>
248
- <tr><td rowspan="3">Qwen2.5-1.5B-Instruct</td><td>BF16</td><td>67.01</td><td>60.05</td><td>54.28</td></tr>
249
  <tr><td>FP8-Static</td><td>66.27</td><td>60.23</td><td>-</td></tr>
250
  <tr><td>FP8-Dynamic</td><td>66.79</td><td>60.08</td><td>51.71</td></tr>
251
- <tr><td rowspan="5">Qwen2.5-7B-Instruct</td><td>BF16</td><td>81.20</td><td>74.55</td><td>79.98</td></tr>
252
  <tr><td>FP8-Static</td><td>81.13</td><td>74.03</td><td>79.30</td></tr>
253
  <tr><td>FP8-Dynamic</td><td>80.31</td><td>74.07</td><td>79.00</td></tr>
254
  <tr><td>INT4-GPTQ</td><td>79.05</td><td>73.05</td><td>74.75</td></tr>
255
  <tr><td>INT4-AWQ</td><td>79.35</td><td>73.22</td><td>79.38</td></tr>
256
- <tr><td rowspan="5">Qwen2.5-32B-Instruct</td><td>BF16</td><td>87.30</td><td>83.21</td><td>81.73</td></tr>
257
  <tr><td>FP8-Static</td><td>87.59</td><td>83.08</td><td>81.58</td></tr>
258
  <tr><td>FP8-Dynamic</td><td>87.30</td><td>83.04</td><td>81.58</td></tr>
259
  <tr><td>INT4-GPTQ</td><td>86.70</td><td>82.45</td><td>82.03</td></tr>
260
  <tr><td>INT4-AWQ</td><td>87.00</td><td>82.64</td><td>-</td></tr>
261
- <tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-7B</td><td>BF16</td><td>53.49</td><td>53.80</td><td>75.74</td></tr>
262
  <tr><td>FP8-Static</td><td>53.57</td><td>54.17</td><td>76.19</td></tr>
263
  <tr><td>FP8-Dynamic</td><td>52.97</td><td>54.13</td><td>74.15</td></tr>
264
  <tr><td>INT4-GPTQ</td><td>51.86</td><td>52.44</td><td>75.89</td></tr>
265
  <tr><td>INT4-AWQ</td><td>53.49</td><td>53.70</td><td>-</td></tr>
266
- <tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-14B</td><td>BF16</td><td>77.71</td><td>74.28</td><td>85.67</td></tr>
267
  <tr><td>FP8-Static</td><td>77.56</td><td>74.66</td><td>86.73</td></tr>
268
  <tr><td>FP8-Dynamic</td><td>76.82</td><td>74.63</td><td>87.11</td></tr>
269
  <tr><td>INT4-GPTQ</td><td>74.29</td><td>72.37</td><td>84.61</td></tr>
270
  <tr><td>INT4-AWQ</td><td>74.81</td><td>73.00</td><td>86.05</td></tr>
271
- <tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-32B</td><td>BF16</td><td>84.18</td><td>80.89</td><td>87.41</td></tr>
272
  <tr><td>FP8-Static</td><td>83.43</td><td>80.90</td><td>87.57</td></tr>
273
  <tr><td>FP8-Dynamic</td><td>83.73</td><td>81.10</td><td>86.43</td></tr>
274
  <tr><td>INT4-GPTQ</td><td>84.10</td><td>79.80</td><td>86.73</td></tr>
@@ -277,7 +292,6 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
277
  </table>
278
 
279
  ### (2) Speculative Decoding
280
-
281
  #### Qwen3 Series Models
282
  Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
283
 
@@ -346,16 +360,28 @@ The code for this project is open-sourced under the [License for AngelSlim](LICE
346
 
347
  ## ๐Ÿ”— Citation
348
 
 
349
  ```
350
  @software{AngelSlim2025,
351
  title={{AngelSlim}},
352
  author={Tencent AngelSlim Project Contributors},
353
  year={2025},
354
- month={6},
355
  url={https://github.com/Tencent/AngelSlim},
356
  }
357
  ```
358
 
 
 
 
 
 
 
 
 
 
 
 
359
  ## ๐Ÿ’ฌ Technical Discussion
360
 
361
  * AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue on GitHub or join our [WeChat technical discussion group](https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/angel_slim_wechat.png?raw=true).
 
3
  - hunyuan
4
  - eagle3
5
  - eagle
6
+ - quantization
7
+ - ternary-quantization
8
+ - tequila
9
+ pipeline_tag: text-generation
10
+ library_name: transformers
11
+ license: apache-2.0
12
  ---
13
 
14
+ # Tequila: Trapping-free Ternary Quantization for Large Language Models
15
+
16
+ This repository provides models and/or implementations related to the **Tequila** method, a novel trapping-free ternary quantization technique for Large Language Models, as introduced in the paper:
17
+ [**Tequila: Trapping-free Ternary Quantization for Large Language Models**](https://huggingface.co/papers/2509.23809)
18
+
19
+ Tequila is implemented as part of the broader **AngelSlim** compression toolkit, which aims to provide intuitive, comprehensive, and efficient tools for LLM compression.
20
+
21
+ For the Tequila specific implementation code, please refer to: [https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant)
22
+
23
  <p align="center">
24
  <picture>
25
  <source media="(prefers-color-scheme: dark)" srcset="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo_light.png?raw=true">
 
45
  - [How to Use](#how-to-use)
46
  - [Install AngelSlim](#install-angelslim)
47
  - [Quick Start](#quick-start)
48
+ - [Deployment & Evaluation](#deployment)
49
  - [Benchmark](#benchmark)
50
  - [License](#license)
51
  - [Citation](#citation)
 
260
  <tr><th>Model</th><th>Quantization</th><th>CEVAL</th><th>MMLU</th><th>GSM8K</th></tr>
261
  </thead>
262
  <tbody>
263
+ <tr><td rowspan=\"3\">Qwen2.5-1.5B-Instruct</td><td>BF16</td><td>67.01</td><td>60.05</td><td>54.28</td></tr>
264
  <tr><td>FP8-Static</td><td>66.27</td><td>60.23</td><td>-</td></tr>
265
  <tr><td>FP8-Dynamic</td><td>66.79</td><td>60.08</td><td>51.71</td></tr>
266
+ <tr><td rowspan=\"5\">Qwen2.5-7B-Instruct</td><td>BF16</td><td>81.20</td><td>74.55</td><td>79.98</td></tr>
267
  <tr><td>FP8-Static</td><td>81.13</td><td>74.03</td><td>79.30</td></tr>
268
  <tr><td>FP8-Dynamic</td><td>80.31</td><td>74.07</td><td>79.00</td></tr>
269
  <tr><td>INT4-GPTQ</td><td>79.05</td><td>73.05</td><td>74.75</td></tr>
270
  <tr><td>INT4-AWQ</td><td>79.35</td><td>73.22</td><td>79.38</td></tr>
271
+ <tr><td rowspan=\"5\">Qwen2.5-32B-Instruct</td><td>BF16</td><td>87.30</td><td>83.21</td><td>81.73</td></tr>
272
  <tr><td>FP8-Static</td><td>87.59</td><td>83.08</td><td>81.58</td></tr>
273
  <tr><td>FP8-Dynamic</td><td>87.30</td><td>83.04</td><td>81.58</td></tr>
274
  <tr><td>INT4-GPTQ</td><td>86.70</td><td>82.45</td><td>82.03</td></tr>
275
  <tr><td>INT4-AWQ</td><td>87.00</td><td>82.64</td><td>-</td></tr>
276
+ <tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-7B</td><td>BF16</td><td>53.49</td><td>53.80</td><td>75.74</td></tr>
277
  <tr><td>FP8-Static</td><td>53.57</td><td>54.17</td><td>76.19</td></tr>
278
  <tr><td>FP8-Dynamic</td><td>52.97</td><td>54.13</td><td>74.15</td></tr>
279
  <tr><td>INT4-GPTQ</td><td>51.86</td><td>52.44</td><td>75.89</td></tr>
280
  <tr><td>INT4-AWQ</td><td>53.49</td><td>53.70</td><td>-</td></tr>
281
+ <tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-14B</td><td>BF16</td><td>77.71</td><td>74.28</td><td>85.67</td></tr>
282
  <tr><td>FP8-Static</td><td>77.56</td><td>74.66</td><td>86.73</td></tr>
283
  <tr><td>FP8-Dynamic</td><td>76.82</td><td>74.63</td><td>87.11</td></tr>
284
  <tr><td>INT4-GPTQ</td><td>74.29</td><td>72.37</td><td>84.61</td></tr>
285
  <tr><td>INT4-AWQ</td><td>74.81</td><td>73.00</td><td>86.05</td></tr>
286
+ <tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-32B</td><td>BF16</td><td>84.18</td><td>80.89</td><td>87.41</td></tr>
287
  <tr><td>FP8-Static</td><td>83.43</td><td>80.90</td><td>87.57</td></tr>
288
  <tr><td>FP8-Dynamic</td><td>83.73</td><td>81.10</td><td>86.43</td></tr>
289
  <tr><td>INT4-GPTQ</td><td>84.10</td><td>79.80</td><td>86.73</td></tr>
 
292
  </table>
293
 
294
  ### (2) Speculative Decoding
 
295
  #### Qwen3 Series Models
296
  Benchmark results for Qwen3 series models with `Eagle3` speculative decoding algorithm on datasets including `MT-bench`, `HunmanEval`, `GSM8K`, and `Alpaca`:
297
 
 
360
 
361
  ## ๐Ÿ”— Citation
362
 
363
+ If you use AngelSlim, please cite it as:
364
  ```
365
  @software{AngelSlim2025,
366
  title={{AngelSlim}},
367
  author={Tencent AngelSlim Project Contributors},
368
  year={2025},
369
+ month={7},
370
  url={https://github.com/Tencent/AngelSlim},
371
  }
372
  ```
373
 
374
+ If you use the Tequila quantization method, please also cite its corresponding paper:
375
+ ```bibtex
376
+ @article{li2025tequila,
377
+ title={{Tequila: Trapping-free Ternary Quantization for Large Language Models}},
378
+ author={Li, Yuhui and Wei, Fangyun and Zhang, Chao and Zhang, Hongyang},
379
+ journal={arXiv preprint arXiv:2509.23809},
380
+ year={2025},
381
+ url={https://arxiv.org/abs/2509.23809}
382
+ }
383
+ ```
384
+
385
  ## ๐Ÿ’ฌ Technical Discussion
386
 
387
  * AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue on GitHub or join our [WeChat technical discussion group](https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/angel_slim_wechat.png?raw=true).