nielsr HF Staff commited on
Commit
2b6f604
·
verified ·
1 Parent(s): 4f057a8

Update AngelSlim Model Card: Add Tequila Paper Details, Metadata, and Latest Updates

Browse files

This PR improves the `AngelSlim` model card by:

- Adding `license: apache-2.0`, `pipeline_tag: text-generation`, and `library_name: transformers` to the YAML metadata for better discoverability and functionality.
- Adding `quantization`, `tequila`, and `llm` to the `tags` metadata to more accurately reflect the model's core contributions.
- Introducing a dedicated "About Tequila" section at the top to prominently feature the paper [Tequila: Trapping-free Ternary Quantization for Large Language Models](https://huggingface.co/papers/2509.23809) and summarize its key aspects.
- Updating the "Latest Updates" section to reflect the most recent developments as seen in the GitHub README, including the Tequila release entry with its existing Arxiv link.

These changes ensure the model card is up-to-date and provides more comprehensive information to users.

Files changed (1) hide show
  1. README.md +21 -2
README.md CHANGED
@@ -3,6 +3,12 @@ tags:
3
  - hunyuan
4
  - eagle3
5
  - eagle
 
 
 
 
 
 
6
  ---
7
 
8
  <p align="center">
@@ -21,6 +27,15 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
21
  <br>
22
  </p>
23
 
 
 
 
 
 
 
 
 
 
24
 
25
  ## Table of Contents
26
 
@@ -38,8 +53,12 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
38
 
39
  ## 📣Latest Updates
40
 
41
- - [25/07/04] We now support quantization for Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen and other models, including INT8/FP8/INT4 algorithms.
42
- We also opensource Qwen3-8B`s Eagle3 model weight.
 
 
 
 
43
 
44
  Coming soon:
45
 
 
3
  - hunyuan
4
  - eagle3
5
  - eagle
6
+ - quantization
7
+ - tequila
8
+ - llm
9
+ license: apache-2.0
10
+ pipeline_tag: text-generation
11
+ library_name: transformers
12
  ---
13
 
14
  <p align="center">
 
27
  <br>
28
  </p>
29
 
30
+ ---
31
+
32
+ ## About Tequila
33
+
34
+ This repository implements **Tequila: Trapping-free Ternary Quantization for Large Language Models** ([Paper](https://huggingface.co/papers/2509.23809)).
35
+
36
+ Tequila is a novel quantization technique that addresses the accuracy degradation issue in ternary weight quantization (constraining weights to {-1, 0, 1}) for LLMs. It solves the "deadzone trapping" problem, where many weights get stuck at deadzone boundaries, by repurposing these trapped weights as dynamic biases. This allows them to provide continuous signals and receive meaningful gradients during backpropagation, enhancing model capacity and optimization with minimal inference overhead. Tequila significantly outperforms state-of-the-art ternary quantization methods, achieving substantial accuracy gains and nearly matching full-precision performance on benchmarks like ARC, while offering a 3.0x inference speedup for efficient LLM deployment on edge devices.
37
+
38
+ ---
39
 
40
  ## Table of Contents
41
 
 
53
 
54
  ## 📣Latest Updates
55
 
56
+ - 🌟[25/09/30] We open-sourced the implementation of SpecExit algorithm: *SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html)
57
+ - 🌟[25/09/30] We released the implementation of ternary quantization Tequila: *TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant).
58
+ - [25/09/24] We supported NVFP4 PTQ quantization for Qwen3 series models, and also open-sourced [Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4), [Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4) weights.
59
+ - [25/09/01] We supported FP8 quantization for [Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8) open-source translation model; supported Eagle3 Torch inference and Benchmark evaluation process; supported [FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux) quantization, Cache; supported [Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss) model quantization and compression.
60
+ - [25/08/06] We supported FP8, INT4 quantization for `Hunyuan 0.5B/1.8B/4B/7B` and `Qwen2.5VL 3B/7B/32B/72B`; supported `FP8-Static`, `W4A8-FP8` quantization for `DeepSeek-R1/V3` and `Kimi-K2` models. We also open-sourced Eagle3 weights for `Hunyuan 1.8B/4B/7B` series models.
61
+ - [25/07/04] We supported quantization for `Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen` and other models, including INT8, FP8, INT4 algorithms. We also open-sourced Eagle3 weights for `Qwen3` series models.
62
 
63
  Coming soon:
64