Improve model card: Add Tequila paper, Transformers usage, license, and updated tags
#2
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,10 +1,22 @@
|
|
| 1 |
---
|
| 2 |
tags:
|
| 3 |
-
- qwen3
|
| 4 |
-
- eagle3
|
| 5 |
- eagle
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
<p align="center">
|
| 9 |
<picture>
|
| 10 |
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo_light.png?raw=true">
|
|
@@ -27,10 +39,11 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
|
|
| 27 |
- [Latest Updates](#latest-updates)
|
| 28 |
- [Key Features](#key-features)
|
| 29 |
- [Supported Models](#supported-models)
|
|
|
|
| 30 |
- [How to Use](#how-to-use)
|
| 31 |
- [Install AngelSlim](#install-angelslim)
|
| 32 |
- [Quick Start](#quick-start)
|
| 33 |
-
- [
|
| 34 |
- [Benchmark](#benchmark)
|
| 35 |
- [License](#license)
|
| 36 |
- [Citation](#citation)
|
|
@@ -38,6 +51,7 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
|
|
| 38 |
|
| 39 |
## 📣Latest Updates
|
| 40 |
|
|
|
|
| 41 |
- [25/07/04] We now support quantization for Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen and other models, including INT8/FP8/INT4 algorithms.
|
| 42 |
We also opensource Qwen3-8B`s Eagle3 model weight.
|
| 43 |
|
|
@@ -80,6 +94,45 @@ The Eagle3 weights for the Qwen3 series model are now available.
|
|
| 80 |
| ✅ [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3) |
|
| 81 |
| ✅ [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3) |
|
| 82 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 83 |
## 🛎️How to Use
|
| 84 |
|
| 85 |
### Install AngelSlim
|
|
@@ -209,26 +262,26 @@ Benchmark results for Qwen3 series models with `FP8-Static`, `FP8-Dynamic`, `INT
|
|
| 209 |
<tr><td>INT8-Dynamic</td><td>78.01</td><td>74.84</td><td>86.96</td><td>67.07</td></tr>
|
| 210 |
<tr><td>INT4-GPTQ</td><td>77.19</td><td>73.26</td><td>86.43</td><td>62.20</td></tr>
|
| 211 |
<tr><td>INT4-AWQ</td><td>76.15</td><td>73.59</td><td>86.96</td><td>63.41</td></tr>
|
| 212 |
-
<tr><td rowspan="6">Qwen3-14B</td><td>BF16</td><td>83.06</td><td>78.90</td><td>88.40</td><td>55.49</td></tr>
|
| 213 |
<tr><td>FP8-Static</td><td>82.62</td><td>78.57</td><td>89.46</td><td>57.32</td></tr>
|
| 214 |
<tr><td>FP8-Dynamic</td><td>82.24</td><td>78.92</td><td>88.32</td><td>52.44</td></tr>
|
| 215 |
<tr><td>INT8-Dynamic</td><td>81.87</td><td>78.13</td><td>86.28</td><td>56.10</td></tr>
|
| 216 |
<tr><td>INT4-GPTQ</td><td>81.05</td><td>78.02</td><td>87.34</td><td>57.93</td></tr>
|
| 217 |
<tr><td>INT4-AWQ</td><td>82.02</td><td>77.68</td><td>84.23</td><td>61.59</td></tr>
|
| 218 |
-
<tr><td rowspan="5">Qwen3-32B</td><td>BF16</td><td>86.55</td><td>82.00</td><td>74.53</td><td>37.80</td></tr>
|
| 219 |
<tr><td>FP8-Static</td><td>86.92</td><td>81.78</td><td>70.20</td><td>39.63</td></tr>
|
| 220 |
<tr><td>FP8-Dynamic</td><td>86.55</td><td>81.89</td><td>70.43</td><td>38.41</td></tr>
|
| 221 |
<tr><td>INT4-GPTQ</td><td>86.18</td><td>81.01</td><td>-</td><td>43.29</td></tr>
|
| 222 |
<tr><td>INT4-AWQ</td><td>86.18</td><td>81.54</td><td>-</td><td>36.59</td></tr>
|
| 223 |
-
<tr><td rowspan="4">Qwen3-30B-A3B</td><td>BF16</td><td>83.66</td><td>79.36</td><td>89.99</td><td>31.71</td></tr>
|
| 224 |
<tr><td>FP8-Static</td><td>83.95</td><td>79.47</td><td>89.01</td><td>31.10</td></tr>
|
| 225 |
<tr><td>FP8-Dynamic</td><td>84.10</td><td>79.40</td><td>89.16</td><td>32.93</td></tr>
|
| 226 |
<tr><td>INT8-Dynamic</td><td>83.36</td><td>79.48</td><td>89.16</td><td>34.15</td></tr>
|
| 227 |
-
<tr><td rowspan="4">Qwen3-235B-A22B</td><td>BF16</td><td>89.60</td><td>86.28</td><td>85.29</td><td>27.44</td></tr>
|
| 228 |
<tr><td>FP8-Static</td><td>89.67</td><td>86.19</td><td>86.96</td><td>27.44</td></tr>
|
| 229 |
<tr><td>FP8-Dynamic</td><td>89.67</td><td>86.18</td><td>85.22</td><td>28.05</td></tr>
|
| 230 |
<tr><td>INT8-Dynamic</td><td>88.93</td><td>86.20</td><td>86.20</td><td>23.78</td></tr>
|
| 231 |
-
<tr><td rowspan="5">QwQ-32B</td><td>BF16</td><td>85.74</td><td>82.03</td><td>73.31</td><td>42.68</td></tr>
|
| 232 |
<tr><td>FP8-Static</td><td>85.44</td><td>81.91</td><td>75.36</td><td>42.68</td></tr>
|
| 233 |
<tr><td>FP8-Dynamic</td><td>85.07</td><td>81.93</td><td>75.66</td><td>42.07</td></tr>
|
| 234 |
<tr><td>INT4-GPTQ</td><td>84.03</td><td>81.26</td><td>68.23</td><td>45.73</td></tr>
|
|
@@ -245,30 +298,30 @@ Benchmark results for other models with `FP8-Static`, `FP8-Dynamic`, `INT4-GPTQ`
|
|
| 245 |
<tr><th>Model</th><th>Quantization</th><th>CEVAL</th><th>MMLU</th><th>GSM8K</th></tr>
|
| 246 |
</thead>
|
| 247 |
<tbody>
|
| 248 |
-
<tr><td rowspan="3">Qwen2.5-1.5B-Instruct</td><td>BF16</td><td>67.01</td><td>60.05</td><td>54.28</td></tr>
|
| 249 |
<tr><td>FP8-Static</td><td>66.27</td><td>60.23</td><td>-</td></tr>
|
| 250 |
<tr><td>FP8-Dynamic</td><td>66.79</td><td>60.08</td><td>51.71</td></tr>
|
| 251 |
-
<tr><td rowspan="5">Qwen2.5-7B-Instruct</td><td>BF16</td><td>81.20</td><td>74.55</td><td>79.98</td></tr>
|
| 252 |
<tr><td>FP8-Static</td><td>81.13</td><td>74.03</td><td>79.30</td></tr>
|
| 253 |
<tr><td>FP8-Dynamic</td><td>80.31</td><td>74.07</td><td>79.00</td></tr>
|
| 254 |
<tr><td>INT4-GPTQ</td><td>79.05</td><td>73.05</td><td>74.75</td></tr>
|
| 255 |
<tr><td>INT4-AWQ</td><td>79.35</td><td>73.22</td><td>79.38</td></tr>
|
| 256 |
-
<tr><td rowspan="5">Qwen2.5-32B-Instruct</td><td>BF16</td><td>87.30</td><td>83.21</td><td>81.73</td></tr>
|
| 257 |
<tr><td>FP8-Static</td><td>87.59</td><td>83.08</td><td>81.58</td></tr>
|
| 258 |
<tr><td>FP8-Dynamic</td><td>87.30</td><td>83.04</td><td>81.58</td></tr>
|
| 259 |
<tr><td>INT4-GPTQ</td><td>86.70</td><td>82.45</td><td>82.03</td></tr>
|
| 260 |
<tr><td>INT4-AWQ</td><td>87.00</td><td>82.64</td><td>-</td></tr>
|
| 261 |
-
<tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-7B</td><td>BF16</td><td>53.49</td><td>53.80</td><td>75.74</td></tr>
|
| 262 |
<tr><td>FP8-Static</td><td>53.57</td><td>54.17</td><td>76.19</td></tr>
|
| 263 |
<tr><td>FP8-Dynamic</td><td>52.97</td><td>54.13</td><td>74.15</td></tr>
|
| 264 |
<tr><td>INT4-GPTQ</td><td>51.86</td><td>52.44</td><td>75.89</td></tr>
|
| 265 |
<tr><td>INT4-AWQ</td><td>53.49</td><td>53.70</td><td>-</td></tr>
|
| 266 |
-
<tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-14B</td><td>BF16</td><td>77.71</td><td>74.28</td><td>85.67</td></tr>
|
| 267 |
<tr><td>FP8-Static</td><td>77.56</td><td>74.66</td><td>86.73</td></tr>
|
| 268 |
<tr><td>FP8-Dynamic</td><td>76.82</td><td>74.63</td><td>87.11</td></tr>
|
| 269 |
<tr><td>INT4-GPTQ</td><td>74.29</td><td>72.37</td><td>84.61</td></tr>
|
| 270 |
<tr><td>INT4-AWQ</td><td>74.81</td><td>73.00</td><td>86.05</td></tr>
|
| 271 |
-
<tr><td rowspan="5">DeepSeek-R1-Distill-Qwen-32B</td><td>BF16</td><td>84.18</td><td>80.89</td><td>87.41</td></tr>
|
| 272 |
<tr><td>FP8-Static</td><td>83.43</td><td>80.90</td><td>87.57</td></tr>
|
| 273 |
<tr><td>FP8-Dynamic</td><td>83.73</td><td>81.10</td><td>86.43</td></tr>
|
| 274 |
<tr><td>INT4-GPTQ</td><td>84.10</td><td>79.80</td><td>86.73</td></tr>
|
|
@@ -294,15 +347,15 @@ Benchmark results for Qwen3 series models with `Eagle3` speculative decoding alg
|
|
| 294 |
</thead>
|
| 295 |
<tbody>
|
| 296 |
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
|
| 297 |
-
<tr><td rowspan="6"><strong>T=0</strong></td>
|
| 298 |
<td>Qwen3-1.7B</td><td>2.05x</td><td>2.81</td><td>2.07x</td><td>2.93</td><td>2.11x</td><td>2.98</td><td>1.93x</td><td>2.69</td><td>2.04x</td><td>2.85</td></tr>
|
| 299 |
<tr> <td>Qwen3-4B</td><td>2.21x</td><td>3.01</td><td>2.36x</td><td>3.24</td><td>2.42x</td><td>3.13</td><td>2.32x</td><td>2.75</td><td>2.33x</td><td>3.03</td></tr>
|
| 300 |
<tr><td>Qwen3-8B</td><td>2.65x</td><td>3.87</td><td>2.64x</td><td>3.82</td><td>2.86x</td><td>4.10</td><td>2.58x</td><td>3.55</td><td>2.68x</td><td>3.83</td></tr>
|
| 301 |
<tr><td>Qwen3-14B</td><td>2.42x</td><td>3.38</td><td>2.57x</td><td>3.58</td><td>2.75x</td><td>3.77</td><td>2.27x</td><td>3.11</td><td>2.50x</td><td>3.46</td></tr>
|
| 302 |
<tr><td>Qwen3-32B</td><td>2.39x</td><td>2.78</td><td>2.37x</td><td>2.81</td><td>2.47x</td><td>2.92</td><td>2.42x</td><td>2.53</td><td>2.41x</td><td>2.76</td></tr>
|
| 303 |
<tr><td>Qwen3-30B-A3B</td><td>2.84x</td><td>3.63</td><td>2.27x</td><td>3.09</td><td>2.64x</td><td>3.42</td><td>2.83x</td><td>3.56</td><td>2.64x</td><td>3.42</td></tr>
|
| 304 |
-
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
|
| 305 |
-
<tr><td rowspan="6"><strong>T=1</strong></td>
|
| 306 |
<td>Qwen3-1.7B</td><td>1.74x</td><td>2.53</td><td>1.86x</td><td>2.70</td><td>1.82x</td><td>2.69</td><td>1.72x</td><td>2.46</td><td>1.93x</td><td>2.60</td></tr>
|
| 307 |
<tr><td>Qwen3-4B</td><td>1.93x</td><td>2.60</td><td>2.00x</td><td>2.84</td><td>2.11x</td><td>2.82</td><td>2.34x</td><td>2.50</td><td>1.75x</td><td>2.69</td></tr>
|
| 308 |
<tr><td>Qwen3-8B</td><td>1.91x</td><td>2.84</td><td>2.07x</td><td>3.05</td><td>2.34x</td><td>3.26</td><td>2.09x</td><td>2.92</td><td>2.10x</td><td>3.02</td></tr>
|
|
@@ -328,12 +381,12 @@ Benchmark results for Hunyuan series models with `Eagle3` speculative decoding a
|
|
| 328 |
</thead>
|
| 329 |
<tbody>
|
| 330 |
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
|
| 331 |
-
<tr><td rowspan="3"><strong>T=0</strong></td>
|
| 332 |
<td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
|
| 333 |
<tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
|
| 334 |
<tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
|
| 335 |
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
|
| 336 |
-
<tr><td rowspan="3"><strong>T=1</strong></td>
|
| 337 |
<td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
|
| 338 |
<tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
|
| 339 |
<tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
|
|
|
|
| 1 |
---
|
| 2 |
tags:
|
|
|
|
|
|
|
| 3 |
- eagle
|
| 4 |
+
- eagle3
|
| 5 |
+
- llama
|
| 6 |
+
- qwen3
|
| 7 |
+
- quantization
|
| 8 |
+
- speculative-decoding
|
| 9 |
+
- tequila
|
| 10 |
+
- ternary-quantization
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
+
library_name: transformers
|
| 13 |
+
license: apache-2.0
|
| 14 |
---
|
| 15 |
|
| 16 |
+
This model repository is part of the **AngelSlim** toolkit and implements the **Tequila: Trapping-free Ternary Quantization for Large Language Models** method, as presented in the paper [Tequila: Trapping-free Ternary Quantization for Large Language Models](https://huggingface.co/papers/2509.23809).
|
| 17 |
+
|
| 18 |
+
For the Tequila quantization implementation, refer to the [AngelSlim GitHub repository](https://github.com/Tencent/AngelSlim) and specifically the [TernaryQuant branch](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant).
|
| 19 |
+
|
| 20 |
<p align="center">
|
| 21 |
<picture>
|
| 22 |
<source media="(prefers-color-scheme: dark)" srcset="https://github.com/Tencent/AngelSlim/blob/main/docs/source/assets/logos/angelslim_logo_light.png?raw=true">
|
|
|
|
| 39 |
- [Latest Updates](#latest-updates)
|
| 40 |
- [Key Features](#key-features)
|
| 41 |
- [Supported Models](#supported-models)
|
| 42 |
+
- [Sample Usage](#sample-usage)
|
| 43 |
- [How to Use](#how-to-use)
|
| 44 |
- [Install AngelSlim](#install-angelslim)
|
| 45 |
- [Quick Start](#quick-start)
|
| 46 |
+
- [Deployment and Testing](#deployment-and-testing)
|
| 47 |
- [Benchmark](#benchmark)
|
| 48 |
- [License](#license)
|
| 49 |
- [Citation](#citation)
|
|
|
|
| 51 |
|
| 52 |
## 📣Latest Updates
|
| 53 |
|
| 54 |
+
- [25/09/30] We released Tequila's implementation: *TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[论文]](https://arxiv.org/abs/2509.23809) | [[代码]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant).
|
| 55 |
- [25/07/04] We now support quantization for Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen and other models, including INT8/FP8/INT4 algorithms.
|
| 56 |
We also opensource Qwen3-8B`s Eagle3 model weight.
|
| 57 |
|
|
|
|
| 94 |
| ✅ [Qwen3-32B](https://huggingface.co/AngelSlim/Qwen3-32B_eagle3) |
|
| 95 |
| ✅ [Qwen3-30B-A3B](https://huggingface.co/AngelSlim/Qwen3-a3B_eagle3) |
|
| 96 |
|
| 97 |
+
## 🛎️Sample Usage
|
| 98 |
+
|
| 99 |
+
You can use our provided "eagenerate" for speedup generation just like using 'generate' from Hugging Face. Here is an example:
|
| 100 |
+
|
| 101 |
+
```python
|
| 102 |
+
from eagle.model.ea_model import EaModel
|
| 103 |
+
from fastchat.model import get_conversation_template
|
| 104 |
+
import torch
|
| 105 |
+
from transformers import AutoTokenizer # Assuming AutoTokenizer is available via transformers
|
| 106 |
+
|
| 107 |
+
# Placeholder paths, replace with actual model paths
|
| 108 |
+
base_model_path = "YOUR_BASE_MODEL_PATH" # e.g., "meta-llama/Llama-3.1-8B-Instruct"
|
| 109 |
+
EAGLE_model_path = "YOUR_EAGLE_MODEL_PATH" # This repository's path or a specific EAGLE checkpoint
|
| 110 |
+
|
| 111 |
+
model = EaModel.from_pretrained(
|
| 112 |
+
base_model_path=base_model_path,
|
| 113 |
+
ea_model_path=EAGLE_model_path,
|
| 114 |
+
torch_dtype=torch.float16,
|
| 115 |
+
low_cpu_mem_usage=True,
|
| 116 |
+
device_map="auto",
|
| 117 |
+
total_token=-1
|
| 118 |
+
)
|
| 119 |
+
model.eval()
|
| 120 |
+
your_message="Hello"
|
| 121 |
+
conv = get_conversation_template("vicuna")
|
| 122 |
+
conv.append_message(conv.roles[0], your_message)
|
| 123 |
+
conv.append_message(conv.roles[1], None)
|
| 124 |
+
prompt = conv.get_prompt()
|
| 125 |
+
input_ids=model.tokenizer([prompt]).input_ids
|
| 126 |
+
input_ids = torch.as_tensor(input_ids).cuda()
|
| 127 |
+
output_ids=model.eagenerate(input_ids,temperature=0.5,max_new_tokens=512)
|
| 128 |
+
output=model.tokenizer.decode(output_ids[0])
|
| 129 |
+
print(output)
|
| 130 |
+
```
|
| 131 |
+
|
| 132 |
+
**_Note: Vicuna, LLaMA2-Chat, and LLaMA3-Instruct are both chat models. You need to use the correct chat template, otherwise it will cause abnormal output from the model and affect the performance of EAGLE._**
|
| 133 |
+
|
| 134 |
+
For detailed instructions on installation, deployment, and running the AngelSlim toolkit, please refer to the [AngelSlim GitHub repository](https://github.com/Tencent/AngelSlim).
|
| 135 |
+
|
| 136 |
## 🛎️How to Use
|
| 137 |
|
| 138 |
### Install AngelSlim
|
|
|
|
| 262 |
<tr><td>INT8-Dynamic</td><td>78.01</td><td>74.84</td><td>86.96</td><td>67.07</td></tr>
|
| 263 |
<tr><td>INT4-GPTQ</td><td>77.19</td><td>73.26</td><td>86.43</td><td>62.20</td></tr>
|
| 264 |
<tr><td>INT4-AWQ</td><td>76.15</td><td>73.59</td><td>86.96</td><td>63.41</td></tr>
|
| 265 |
+
<tr><td rowspan=\"6\">Qwen3-14B</td><td>BF16</td><td>83.06</td><td>78.90</td><td>88.40</td><td>55.49</td></tr>
|
| 266 |
<tr><td>FP8-Static</td><td>82.62</td><td>78.57</td><td>89.46</td><td>57.32</td></tr>
|
| 267 |
<tr><td>FP8-Dynamic</td><td>82.24</td><td>78.92</td><td>88.32</td><td>52.44</td></tr>
|
| 268 |
<tr><td>INT8-Dynamic</td><td>81.87</td><td>78.13</td><td>86.28</td><td>56.10</td></tr>
|
| 269 |
<tr><td>INT4-GPTQ</td><td>81.05</td><td>78.02</td><td>87.34</td><td>57.93</td></tr>
|
| 270 |
<tr><td>INT4-AWQ</td><td>82.02</td><td>77.68</td><td>84.23</td><td>61.59</td></tr>
|
| 271 |
+
<tr><td rowspan=\"5\">Qwen3-32B</td><td>BF16</td><td>86.55</td><td>82.00</td><td>74.53</td><td>37.80</td></tr>
|
| 272 |
<tr><td>FP8-Static</td><td>86.92</td><td>81.78</td><td>70.20</td><td>39.63</td></tr>
|
| 273 |
<tr><td>FP8-Dynamic</td><td>86.55</td><td>81.89</td><td>70.43</td><td>38.41</td></tr>
|
| 274 |
<tr><td>INT4-GPTQ</td><td>86.18</td><td>81.01</td><td>-</td><td>43.29</td></tr>
|
| 275 |
<tr><td>INT4-AWQ</td><td>86.18</td><td>81.54</td><td>-</td><td>36.59</td></tr>
|
| 276 |
+
<tr><td rowspan=\"4\">Qwen3-30B-A3B</td><td>BF16</td><td>83.66</td><td>79.36</td><td>89.99</td><td>31.71</td></tr>
|
| 277 |
<tr><td>FP8-Static</td><td>83.95</td><td>79.47</td><td>89.01</td><td>31.10</td></tr>
|
| 278 |
<tr><td>FP8-Dynamic</td><td>84.10</td><td>79.40</td><td>89.16</td><td>32.93</td></tr>
|
| 279 |
<tr><td>INT8-Dynamic</td><td>83.36</td><td>79.48</td><td>89.16</td><td>34.15</td></tr>
|
| 280 |
+
<tr><td rowspan=\"4\">Qwen3-235B-A22B</td><td>BF16</td><td>89.60</td><td>86.28</td><td>85.29</td><td>27.44</td></tr>
|
| 281 |
<tr><td>FP8-Static</td><td>89.67</td><td>86.19</td><td>86.96</td><td>27.44</td></tr>
|
| 282 |
<tr><td>FP8-Dynamic</td><td>89.67</td><td>86.18</td><td>85.22</td><td>28.05</td></tr>
|
| 283 |
<tr><td>INT8-Dynamic</td><td>88.93</td><td>86.20</td><td>86.20</td><td>23.78</td></tr>
|
| 284 |
+
<tr><td rowspan=\"5\">QwQ-32B</td><td>BF16</td><td>85.74</td><td>82.03</td><td>73.31</td><td>42.68</td></tr>
|
| 285 |
<tr><td>FP8-Static</td><td>85.44</td><td>81.91</td><td>75.36</td><td>42.68</td></tr>
|
| 286 |
<tr><td>FP8-Dynamic</td><td>85.07</td><td>81.93</td><td>75.66</td><td>42.07</td></tr>
|
| 287 |
<tr><td>INT4-GPTQ</td><td>84.03</td><td>81.26</td><td>68.23</td><td>45.73</td></tr>
|
|
|
|
| 298 |
<tr><th>Model</th><th>Quantization</th><th>CEVAL</th><th>MMLU</th><th>GSM8K</th></tr>
|
| 299 |
</thead>
|
| 300 |
<tbody>
|
| 301 |
+
<tr><td rowspan=\"3\">Qwen2.5-1.5B-Instruct</td><td>BF16</td><td>67.01</td><td>60.05</td><td>54.28</td></tr>
|
| 302 |
<tr><td>FP8-Static</td><td>66.27</td><td>60.23</td><td>-</td></tr>
|
| 303 |
<tr><td>FP8-Dynamic</td><td>66.79</td><td>60.08</td><td>51.71</td></tr>
|
| 304 |
+
<tr><td rowspan=\"5\">Qwen2.5-7B-Instruct</td><td>BF16</td><td>81.20</td><td>74.55</td><td>79.98</td></tr>
|
| 305 |
<tr><td>FP8-Static</td><td>81.13</td><td>74.03</td><td>79.30</td></tr>
|
| 306 |
<tr><td>FP8-Dynamic</td><td>80.31</td><td>74.07</td><td>79.00</td></tr>
|
| 307 |
<tr><td>INT4-GPTQ</td><td>79.05</td><td>73.05</td><td>74.75</td></tr>
|
| 308 |
<tr><td>INT4-AWQ</td><td>79.35</td><td>73.22</td><td>79.38</td></tr>
|
| 309 |
+
<tr><td rowspan=\"5\">Qwen2.5-32B-Instruct</td><td>BF16</td><td>87.30</td><td>83.21</td><td>81.73</td></tr>
|
| 310 |
<tr><td>FP8-Static</td><td>87.59</td><td>83.08</td><td>81.58</td></tr>
|
| 311 |
<tr><td>FP8-Dynamic</td><td>87.30</td><td>83.04</td><td>81.58</td></tr>
|
| 312 |
<tr><td>INT4-GPTQ</td><td>86.70</td><td>82.45</td><td>82.03</td></tr>
|
| 313 |
<tr><td>INT4-AWQ</td><td>87.00</td><td>82.64</td><td>-</td></tr>
|
| 314 |
+
<tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-7B</td><td>BF16</td><td>53.49</td><td>53.80</td><td>75.74</td></tr>
|
| 315 |
<tr><td>FP8-Static</td><td>53.57</td><td>54.17</td><td>76.19</td></tr>
|
| 316 |
<tr><td>FP8-Dynamic</td><td>52.97</td><td>54.13</td><td>74.15</td></tr>
|
| 317 |
<tr><td>INT4-GPTQ</td><td>51.86</td><td>52.44</td><td>75.89</td></tr>
|
| 318 |
<tr><td>INT4-AWQ</td><td>53.49</td><td>53.70</td><td>-</td></tr>
|
| 319 |
+
<tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-14B</td><td>BF16</td><td>77.71</td><td>74.28</td><td>85.67</td></tr>
|
| 320 |
<tr><td>FP8-Static</td><td>77.56</td><td>74.66</td><td>86.73</td></tr>
|
| 321 |
<tr><td>FP8-Dynamic</td><td>76.82</td><td>74.63</td><td>87.11</td></tr>
|
| 322 |
<tr><td>INT4-GPTQ</td><td>74.29</td><td>72.37</td><td>84.61</td></tr>
|
| 323 |
<tr><td>INT4-AWQ</td><td>74.81</td><td>73.00</td><td>86.05</td></tr>
|
| 324 |
+
<tr><td rowspan=\"5\">DeepSeek-R1-Distill-Qwen-32B</td><td>BF16</td><td>84.18</td><td>80.89</td><td>87.41</td></tr>
|
| 325 |
<tr><td>FP8-Static</td><td>83.43</td><td>80.90</td><td>87.57</td></tr>
|
| 326 |
<tr><td>FP8-Dynamic</td><td>83.73</td><td>81.10</td><td>86.43</td></tr>
|
| 327 |
<tr><td>INT4-GPTQ</td><td>84.10</td><td>79.80</td><td>86.73</td></tr>
|
|
|
|
| 347 |
</thead>
|
| 348 |
<tbody>
|
| 349 |
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
|
| 350 |
+
<tr><td rowspan=\"6\"><strong>T=0</strong></td>
|
| 351 |
<td>Qwen3-1.7B</td><td>2.05x</td><td>2.81</td><td>2.07x</td><td>2.93</td><td>2.11x</td><td>2.98</td><td>1.93x</td><td>2.69</td><td>2.04x</td><td>2.85</td></tr>
|
| 352 |
<tr> <td>Qwen3-4B</td><td>2.21x</td><td>3.01</td><td>2.36x</td><td>3.24</td><td>2.42x</td><td>3.13</td><td>2.32x</td><td>2.75</td><td>2.33x</td><td>3.03</td></tr>
|
| 353 |
<tr><td>Qwen3-8B</td><td>2.65x</td><td>3.87</td><td>2.64x</td><td>3.82</td><td>2.86x</td><td>4.10</td><td>2.58x</td><td>3.55</td><td>2.68x</td><td>3.83</td></tr>
|
| 354 |
<tr><td>Qwen3-14B</td><td>2.42x</td><td>3.38</td><td>2.57x</td><td>3.58</td><td>2.75x</td><td>3.77</td><td>2.27x</td><td>3.11</td><td>2.50x</td><td>3.46</td></tr>
|
| 355 |
<tr><td>Qwen3-32B</td><td>2.39x</td><td>2.78</td><td>2.37x</td><td>2.81</td><td>2.47x</td><td>2.92</td><td>2.42x</td><td>2.53</td><td>2.41x</td><td>2.76</td></tr>
|
| 356 |
<tr><td>Qwen3-30B-A3B</td><td>2.84x</td><td>3.63</td><td>2.27x</td><td>3.09</td><td>2.64x</td><td>3.42</td><td>2.83x</td><td>3.56</td><td>2.64x</td><td>3.42</td></tr>
|
| 357 |
+
<!-- <tr><td colspan=\"12\" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
|
| 358 |
+
<tr><td rowspan=\"6\"><strong>T=1</strong></td>
|
| 359 |
<td>Qwen3-1.7B</td><td>1.74x</td><td>2.53</td><td>1.86x</td><td>2.70</td><td>1.82x</td><td>2.69</td><td>1.72x</td><td>2.46</td><td>1.93x</td><td>2.60</td></tr>
|
| 360 |
<tr><td>Qwen3-4B</td><td>1.93x</td><td>2.60</td><td>2.00x</td><td>2.84</td><td>2.11x</td><td>2.82</td><td>2.34x</td><td>2.50</td><td>1.75x</td><td>2.69</td></tr>
|
| 361 |
<tr><td>Qwen3-8B</td><td>1.91x</td><td>2.84</td><td>2.07x</td><td>3.05</td><td>2.34x</td><td>3.26</td><td>2.09x</td><td>2.92</td><td>2.10x</td><td>3.02</td></tr>
|
|
|
|
| 381 |
</thead>
|
| 382 |
<tbody>
|
| 383 |
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=0</strong></td></tr> -->
|
| 384 |
+
<tr><td rowspan=\"3\"><strong>T=0</strong></td>
|
| 385 |
<td>Hunyuan-1.8B-Instruct</td><td>1.97x</td><td>2.90</td><td>2.58x</td><td>3.73</td><td>2.61x</td><td>3.71</td><td>1.71x</td><td>2.43</td><td>2.22x</td><td>3.19</td></tr>
|
| 386 |
<tr> <td>Hunyuan-4B-Instruct</td><td>1.77x</td><td>2.60</td><td>2.64x</td><td>3.35</td><td>2.14x</td><td>3.17</td><td>1.72x</td><td>2.57</td><td>2.07x</td><td>2.92</td></tr>
|
| 387 |
<tr><td>Hunyuan-7B-Instruct</td><td>2.22x</td><td>3.58</td><td>3.59x</td><td>5.47</td><td>2.96x</td><td>4.68</td><td>1.64x</td><td>2.56</td><td>2.60x</td><td>4.07</td></tr>
|
| 388 |
<!-- <tr><td colspan="12" style="text-align: center; vertical-align: middle;"><strong>Temperature=1</strong></td></tr> -->
|
| 389 |
+
<tr><td rowspan=\"3\"><strong>T=1</strong></td>
|
| 390 |
<td>Hunyuan-1.8B-Instruct</td><td>1.58x</td><td>2.36</td><td>2.35x</td><td>3.56</td><td>2.23x</td><td>3.38</td><td>1.26x</td><td>1.87</td><td>1.86x</td><td>2.79</td></tr>
|
| 391 |
<tr><td>Hunyuan-4B-Instruct</td><td>1.36x</td><td>2.05</td><td>1.97x</td><td>2.86</td><td>1.72x</td><td>2.68</td><td>1.14x</td><td>1.76</td><td>1.55x</td><td>2.34</td></tr>
|
| 392 |
<tr><td>Hunyuan-7B-Instruct</td><td>1.90x</td><td>3.11</td><td>3.12x</td><td>5.09</td><td>2.74x</td><td>4.34</td><td>1.47x</td><td>2.39</td><td>2.31x</td><td>3.73</td></tr>
|