|
|
--- |
|
|
library_name: transformers |
|
|
license: other |
|
|
license_name: youtu-llm |
|
|
license_link: https://huggingface.co/tencent/Youtu-LLM-2B-Base/LICENSE.txt |
|
|
pipeline_tag: text-generation |
|
|
instruct_model: |
|
|
- tencent/Youtu-LLM-2B |
|
|
--- |
|
|
<div align="center"> |
|
|
|
|
|
# <img src="assets/youtu-llm-logo.png" alt="Youtu-LLM Logo" height="100px"> |
|
|
|
|
|
[π License](LICENSE.txt) β’ [π» Code](https://github.com/TencentCloudADP/youtu-tip/tree/master/youtu-llm) β’ [π Technical Report](https://arxiv.org/abs/2512.24618) β’ [π Benchmarks](#benchmarks) |
|
|
|
|
|
</div> |
|
|
|
|
|
## π― Brief Introduction |
|
|
|
|
|
**Youtu-LLM** is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks. |
|
|
|
|
|
**Youtu-LLM** has the following features: |
|
|
- Type: Autoregressive Causal Language Models with Dense MLA |
|
|
- Release versions: [Base](https://huggingface.co/tencent/Youtu-LLM-2B-Base) and [Instruct](https://huggingface.co/tencent/Youtu-LLM-2B) |
|
|
- Number of Parameters: 1.96B |
|
|
- Number of Layers: 32 |
|
|
- Number of Attention Heads (MLA): 16 for Q/K/V |
|
|
- MLA Rank: 1,536 for Q, 512 for K/V |
|
|
- MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V |
|
|
- Context Length: 131,072 |
|
|
- Vocabulary Size: 128,256 |
|
|
|
|
|
## π€ Model Download |
|
|
| Model Name | Description | Download | |
|
|
| ----------- | ----------- |----------- |
|
|
| Youtu-LLM-2B-Base | Base model of Youtu-LLM-2B |π€ [Model](https://huggingface.co/tencent/Youtu-LLM-2B-Base)| |
|
|
| Youtu-LLM-2B | Instruct model of Youtu-LLM-2B | π€ [Model](https://huggingface.co/tencent/Youtu-LLM-2B)| |
|
|
| Youtu-LLM-2B-GGUF | Instruct model of Youtu-LLM-2B, in GGUF format | π€ [Model](https://huggingface.co/tencent/Youtu-LLM-2B-GGUF)| |
|
|
|
|
|
## π° News |
|
|
- [2026.01.07] You can now fine-tuning Youtu-LLM with [ModelScope](https://mp.weixin.qq.com/s/JJtQWSYEjnE7GnPkaJ7UNA). |
|
|
- [2026.01.04] You can now fine-tuning Youtu-LLM with [LlamaFactory](https://github.com/hiyouga/LlamaFactory/pull/9707). |
|
|
|
|
|
<a id="benchmarks"></a> |
|
|
|
|
|
## π Performance Comparisons |
|
|
### Base Model |
|
|
|
|
|
# <img src="assets/general_agentic_base.png" alt="Comparison between Youtu-LLM-2B-Base and baselines" height="260px"> |
|
|
|
|
|
#### General Benchmarks |
|
|
| Type | Benchmark (Metric) | # Shots | Qwen3-1.7B-Base | SmoLM3-3B-Base | Gemma3-4B-Base | Qwen3-4B-Base | Llama3.1-8B | Youtu-LLM-2B-Base | |
|
|
| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: | |
|
|
| Commonsense | MMLU-Pro (EM) | 5 | 34.9% | 35.3% | 29.4% | <u>46.1%</u> | 36.2% | **48.4%** | |
|
|
| | MLQA-Zh (EM) | 3 | 38.1% | 38.0% | 40.3% | **47.2%** | 43.0% | <u>43.5%</u> | |
|
|
| | MMLU-ProX-Zh (EM) | 5 | 32.5% | 26.7% | 24.2% | **45.2%** | 25.4% | <u>40.7%</u> | |
|
|
| STEM | GSM8K (EM) | 8 | 68.2% | 67.3% | 38.5% | **80.8%** | 47.8% | <u>77.6%</u> | |
|
|
| | MGSM-Zh (EM) | 8 | 57.1% | 40.7% | 33.0% | **69.7%** | 35.9% | <u>68.9%</u> | |
|
|
| | MATH (EM) | 4 | 28.1% | 40.8% | 24.4% | **44.8%** | 21.5% | <u>44.4%</u> | |
|
|
| | BBH (EM) | 3 | 53.0% | 59.8% | 51.6% | **70.8%** | <u>62.9%</u> | 59.8% | |
|
|
| | GPQA-MC (Acc. Norm) | 5 | 30.4% | 26.6% | 28.6% | **37.8%** | 30.1% | <u>33.3%</u> | |
|
|
| | HLE-MC (Acc. Norm) | 3 | 10.7% | 3.1% | 8.0% | <u>15.0%</u> | 11.5% | **17.4%** | |
|
|
| Coding | MBPP (Pass@1) | 3 | 55.6% | 51.0% | 45.8% | **67.5%** | 49.4% | <u>66.6%</u> | |
|
|
| | MBPP+ (Pass@1) | 3 | 71.0% | 66.1% | 61.9% | <u>80.8%</u> | 62.7% | **81.8%** | |
|
|
| | HumanEval (Pass@1) | 0 | 49.9% | 34.8% | 36.6% | <u>57.6%</u> | 36.0% | **64.6%** | |
|
|
| | HumanEval+ (Pass@1) | 0 | 41.3% | 28.1% | 28.1% | <u>49.9%</u> | 28.1% | **57.3%** | |
|
|
| | LiveCodeBench v6 (Pass@1) | 3 | 5.1% | 2.9% | 2.9% | <u>6.9%</u> | 3.4% | **9.7%** | |
|
|
| | CRUXEval (Pass@1) | 1 | 40.6% | 42.1% | 39.7% | <u>54.8%</u> | 42.3% | **55.9%** | |
|
|
| | RepoBench (EM) | 3 | 21.0% | 21.8% | 23.0% | **25.3%** | <u>25.2%</u> | 22.7% | |
|
|
| Long Context | LongBench v2 (Acc.) | 3 | <u>28.0%</u> | **28.8%** | 26.6% | 25.8% | 27.8% | 27.2% | |
|
|
| | NIAH (Acc.) | / | 79.8% | 75.0% | <u>99.5%</u> | 83.0% | **99.8%** | 98.8% | |
|
|
|
|
|
#### Agentic Benchmarks |
|
|
We takes [APTBench](https://github.com/TencentYoutuResearch/APTBench/) for evaluating the agentic capabilities of base model. |
|
|
|
|
|
| Category | Qwen3-1.7B-Base | SmoLM3-3B-Base | Gemma3-4B-Base | Qwen3-4B-Base | Llama3.1-8B | Youtu-LLM-2B-Base | |
|
|
| :--- | :---: | :---: | :---: | :---: | :---: | :---: | |
|
|
| Code | 25.1% | 24.3% | 32.8% | **41.9%** | 23.6% | <u>37.9%</u> | |
|
|
| Deep Research | 28.5% | 27.2% | 36.4% | **40.5%** | 30.0% | <u>38.6%</u> | |
|
|
| Math | 59.9% | 60.7% | 59.8% | **70.5%** | 60.1% | <u>68.0%</u> | |
|
|
| Tool | 56.7% | 59.1% | 61.7% | **65.8%** | 64.1% | <u>64.2%</u> | |
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you find our work useful in your research, please consider citing the following paper: |
|
|
|
|
|
```bibtex |
|
|
@article{youtu-llm, |
|
|
title={Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models}, |
|
|
author={Tencent Youtu Lab}, |
|
|
year={2025}, |
|
|
eprint={2512.24618}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2512.24618}, |
|
|
} |
|
|
``` |