AngelSlim
/

Hunyuan-4B-Instruct_eagle3

@@ -3,6 +3,12 @@ tags:
 - hunyuan
 - eagle3
 - eagle
 ---
 <p align="center">
@@ -21,6 +27,15 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
 <br>
 </p>
 ## Table of Contents
@@ -38,8 +53,12 @@ Dedicated to building a more intuitive, comprehensive, and efficient LLMs compre
 ## 📣Latest Updates
-- [25/07/04] We now support quantization for Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen and other models, including INT8/FP8/INT4 algorithms.
-              We also opensource Qwen3-8B`s Eagle3 model weight.
 Coming soon:

 - hunyuan
 - eagle3
 - eagle
+- quantization
+- tequila
+- llm
+license: apache-2.0
+pipeline_tag: text-generation
+library_name: transformers
 ---
 <p align="center">
 <br>
 </p>
+---
+## About Tequila
+This repository implements **Tequila: Trapping-free Ternary Quantization for Large Language Models** ([Paper](https://huggingface.co/papers/2509.23809)).
+Tequila is a novel quantization technique that addresses the accuracy degradation issue in ternary weight quantization (constraining weights to {-1, 0, 1}) for LLMs. It solves the "deadzone trapping" problem, where many weights get stuck at deadzone boundaries, by repurposing these trapped weights as dynamic biases. This allows them to provide continuous signals and receive meaningful gradients during backpropagation, enhancing model capacity and optimization with minimal inference overhead. Tequila significantly outperforms state-of-the-art ternary quantization methods, achieving substantial accuracy gains and nearly matching full-precision performance on benchmarks like ARC, while offering a 3.0x inference speedup for efficient LLM deployment on edge devices.
+---
 ## Table of Contents
 ## 📣Latest Updates
+- 🌟[25/09/30] We open-sourced the implementation of SpecExit algorithm: *SpecExit: Accelerating Large Reasoning Model via Speculative Exit* | [[Paper]](http://arxiv.org/abs/2509.24248) | [[Docs]](https://angelslim.readthedocs.io/zh-cn/latest/features/speculative_decoding/spec_exit.html)
+- 🌟[25/09/30] We released the implementation of ternary quantization Tequila: *TEQUILA: TRAPPING-FREE TERNARY QUANTIZATION FOR LARGE LANGUAGE MODELS* | [[Paper]](https://arxiv.org/abs/2509.23809) | [[Code]](https://github.com/Tencent/AngelSlim/tree/tequila/TernaryQuant).
+- [25/09/24] We supported NVFP4 PTQ quantization for Qwen3 series models, and also open-sourced [Qwen3-32B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-32B_nvfp4), [Qwen3-235B-A22B-NVFP4](https://huggingface.co/AngelSlim/Qwen3-235B-A22B_nvfp4) weights.
+- [25/09/01] We supported FP8 quantization for [Hunyuan-MT-7B](https://huggingface.co/tencent/Hunyuan-MT-7B-fp8) open-source translation model; supported Eagle3 Torch inference and Benchmark evaluation process; supported [FLUX](https://github.com/Tencent/AngelSlim/tree/main/configs/flux) quantization, Cache; supported [Seed-OSS](https://github.com/Tencent/AngelSlim/tree/main/configs/seed_oss) model quantization and compression.
+- [25/08/06] We supported FP8, INT4 quantization for `Hunyuan 0.5B/1.8B/4B/7B` and `Qwen2.5VL 3B/7B/32B/72B`; supported `FP8-Static`, `W4A8-FP8` quantization for `DeepSeek-R1/V3` and `Kimi-K2` models. We also open-sourced Eagle3 weights for `Hunyuan 1.8B/4B/7B` series models.
+- [25/07/04] We supported quantization for `Hunyuan/Qwen2.5/Qwen3/DeepSeek-R1-Distill-Qwen` and other models, including INT8, FP8, INT4 algorithms. We also open-sourced Eagle3 weights for `Qwen3` series models.
 Coming soon: