Update AngelSlim Model Card: Add Tequila Paper Details, Metadata, and Latest Updates

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +21 -2
README.md CHANGED
@@ -3,6 +3,12 @@ tags:
3
  - hunyuan
4
  - eagle3
5
  - eagle
 
 
 
 
 
 
6
  ---
7
 
8
  <p align="center">
@@ -21,6 +27,15 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
21
  <br>
22
  </p>
23
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## Table of Contents
26
 
@@ -38,8 +53,12 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
38
 
39
  ## 📣Latest Updates
40
 
41
- - [25/07/04] We now support quantization for Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen and other models, including INT8/FP8/INT4 algorithms.
42
- We also opensource Qwen3-8B`s Eagle3 model weight.
 
 
 
 
43
 
44
  Coming soon:
45
 
 
3
  - hunyuan
4
  - eagle3
5
  - eagle
6
+ - quantization
7
+ - tequila
8
+ - llm
9
+ license: apache-2.0
10
+ pipeline_tag: text-generation
11
+ library_name: transformers
12
  ---
13
 
14
  <p align="center">
 
27
  <br>
28
  </p>
29
 
30
+ ---
31
+
32
+ ## About Tequila
33
+
34
+ This repository implements **Tequila: Trapping-free Ternary Quantization for Large Language Models** ([Paper](https://huggingface.co/papers/2509.23809)).
35
+
36
+ Tequila is a novel quantization technique that addresses the accuracy degradation issue in ternary weight quantization (constraining weights to {-1, 0, 1}) for LLMs. It solves the "deadzone trapping" problem, where many weights get stuck at deadzone boundaries, by repurposing these trapped weights as dynamic biases. This allows them to provide continuous signals and receive meaningful gradients during backpropagation, enhancing model capacity and optimization with minimal inference overhead. Tequila significantly outperforms state-of-the-art ternary quantization methods, achieving substantial accuracy gains and nearly matching full-precision performance on benchmarks like ARC, while offering a 3.0x inference speedup for efficient LLM deployment on edge devices.
37
+
38
+ ---
39
 
40
  ## Table of Contents
41
 
 
53
 
54
  ## 📣Latest Updates
55
 
56
+ - 🌟[25/09/30] We open-sourced the implementation of SpecExit algorithm: *SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html)
57
+ - 🌟[25/09/30] We released the implementation of ternary quantization Tequila: *TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant).
58
+ - [25/09/24] We supported NVFP4 PTQ quantization for Qwen3 series models, and also open-sourced [Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4), [Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4) weights.
59
+ - [25/09/01] We supported FP8 quantization for [Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8) open-source translation model; supported Eagle3 Torch inference and Benchmark evaluation process; supported [FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux) quantization, Cache; supported [Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss) model quantization and compression.
60
+ - [25/08/06] We supported FP8, INT4 quantization for `Hunyuan 0.5B/1.8B/4B/7B` and `Qwen2.5VL 3B/7B/32B/72B`; supported `FP8-Static`, `W4A8-FP8` quantization for `DeepSeek-R1/V3` and `Kimi-K2` models. We also open-sourced Eagle3 weights for `Hunyuan 1.8B/4B/7B` series models.
61
+ - [25/07/04] We supported quantization for `Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen` and other models, including INT8, FP8, INT4 algorithms. We also open-sourced Eagle3 weights for `Qwen3` series models.
62
 
63
  Coming soon:
64