Update AngelSlim Model Card: Add Tequila Paper Details, Metadata, and Latest Updates
Browse filesThis PR improves the `AngelSlim` model card by:
- Adding `license: apache-2.0`, `pipeline_tag: text-generation`, and `library_name: transformers` to the YAML metadata for better discoverability and functionality.
- Adding `quantization`, `tequila`, and `llm` to the `tags` metadata to more accurately reflect the model's core contributions.
- Introducing a dedicated "About Tequila" section at the top to prominently feature the paper [Tequila: Trapping-free Ternary Quantization for Large Language Models](https://huggingface.co/papers/2509.23809) and summarize its key aspects.
- Updating the "Latest Updates" section to reflect the most recent developments as seen in the GitHub README, including the Tequila release entry with its existing Arxiv link.
These changes ensure the model card is up-to-date and provides more comprehensive information to users.
|
@@ -3,6 +3,12 @@ tags:
|
|
| 3 |
- hunyuan
|
| 4 |
- eagle3
|
| 5 |
- eagle
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
---
|
| 7 |
|
| 8 |
<p align="center">
|
|
@@ -21,6 +27,15 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
|
|
| 21 |
<br>
|
| 22 |
</p>
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
|
| 25 |
## Table of Contents
|
| 26 |
|
|
@@ -38,8 +53,12 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
|
|
| 38 |
|
| 39 |
## 📣Latest Updates
|
| 40 |
|
| 41 |
-
- [25/
|
| 42 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 43 |
|
| 44 |
Coming soon:
|
| 45 |
|
|
|
|
| 3 |
- hunyuan
|
| 4 |
- eagle3
|
| 5 |
- eagle
|
| 6 |
+
- quantization
|
| 7 |
+
- tequila
|
| 8 |
+
- llm
|
| 9 |
+
license: apache-2.0
|
| 10 |
+
pipeline_tag: text-generation
|
| 11 |
+
library_name: transformers
|
| 12 |
---
|
| 13 |
|
| 14 |
<p align="center">
|
|
|
|
| 27 |
<br>
|
| 28 |
</p>
|
| 29 |
|
| 30 |
+
---
|
| 31 |
+
|
| 32 |
+
## About Tequila
|
| 33 |
+
|
| 34 |
+
This repository implements **Tequila: Trapping-free Ternary Quantization for Large Language Models** ([Paper](https://huggingface.co/papers/2509.23809)).
|
| 35 |
+
|
| 36 |
+
Tequila is a novel quantization technique that addresses the accuracy degradation issue in ternary weight quantization (constraining weights to {-1, 0, 1}) for LLMs. It solves the "deadzone trapping" problem, where many weights get stuck at deadzone boundaries, by repurposing these trapped weights as dynamic biases. This allows them to provide continuous signals and receive meaningful gradients during backpropagation, enhancing model capacity and optimization with minimal inference overhead. Tequila significantly outperforms state-of-the-art ternary quantization methods, achieving substantial accuracy gains and nearly matching full-precision performance on benchmarks like ARC, while offering a 3.0x inference speedup for efficient LLM deployment on edge devices.
|
| 37 |
+
|
| 38 |
+
---
|
| 39 |
|
| 40 |
## Table of Contents
|
| 41 |
|
|
|
|
| 53 |
|
| 54 |
## 📣Latest Updates
|
| 55 |
|
| 56 |
+
- 🌟[25/09/30] We open-sourced the implementation of SpecExit algorithm: *SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html)
|
| 57 |
+
- 🌟[25/09/30] We released the implementation of ternary quantization Tequila: *TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant).
|
| 58 |
+
- [25/09/24] We supported NVFP4 PTQ quantization for Qwen3 series models, and also open-sourced [Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4), [Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4) weights.
|
| 59 |
+
- [25/09/01] We supported FP8 quantization for [Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8) open-source translation model; supported Eagle3 Torch inference and Benchmark evaluation process; supported [FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux) quantization, Cache; supported [Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss) model quantization and compression.
|
| 60 |
+
- [25/08/06] We supported FP8, INT4 quantization for `Hunyuan 0.5B/1.8B/4B/7B` and `Qwen2.5VL 3B/7B/32B/72B`; supported `FP8-Static`, `W4A8-FP8` quantization for `DeepSeek-R1/V3` and `Kimi-K2` models. We also open-sourced Eagle3 weights for `Hunyuan 1.8B/4B/7B` series models.
|
| 61 |
+
- [25/07/04] We supported quantization for `Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen` and other models, including INT8, FP8, INT4 algorithms. We also open-sourced Eagle3 weights for `Qwen3` series models.
|
| 62 |
|
| 63 |
Coming soon:
|
| 64 |
|