Youtu-LLM-2B-Base / README.md
Junrulu's picture
Update README.md (#8)
3b528d8 verified
---
library_name: transformers
license: other
license_name: youtu-llm
license_link: https://huggingface.co/tencent/Youtu-LLM-2B-Base/LICENSE.txt
pipeline_tag: text-generation
instruct_model:
- tencent/Youtu-LLM-2B
---
<div align="center">
# <img src="assets/youtu-llm-logo.png" alt="Youtu-LLM Logo" height="100px">
[πŸ“ƒ License](LICENSE.txt) β€’ [πŸ’» Code](https://github.com/TencentCloudADP/youtu-tip/tree/master/youtu-llm) β€’ [πŸ“‘ Technical Report](https://arxiv.org/abs/2512.24618) β€’ [πŸ“Š Benchmarks](#benchmarks)
</div>
## 🎯 Brief Introduction
**Youtu-LLM** is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.
**Youtu-LLM** has the following features:
- Type: Autoregressive Causal Language Models with Dense MLA
- Release versions: [Base](https://huggingface.co/tencent/Youtu-LLM-2B-Base) and [Instruct](https://huggingface.co/tencent/Youtu-LLM-2B)
- Number of Parameters: 1.96B
- Number of Layers: 32
- Number of Attention Heads (MLA): 16 for Q/K/V
- MLA Rank: 1,536 for Q, 512 for K/V
- MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V
- Context Length: 131,072
- Vocabulary Size: 128,256
## πŸ€— Model Download
| Model Name | Description | Download |
| ----------- | ----------- |-----------
| Youtu-LLM-2B-Base | Base model of Youtu-LLM-2B |πŸ€— [Model](https://huggingface.co/tencent/Youtu-LLM-2B-Base)|
| Youtu-LLM-2B | Instruct model of Youtu-LLM-2B | πŸ€— [Model](https://huggingface.co/tencent/Youtu-LLM-2B)|
| Youtu-LLM-2B-GGUF | Instruct model of Youtu-LLM-2B, in GGUF format | πŸ€— [Model](https://huggingface.co/tencent/Youtu-LLM-2B-GGUF)|
## πŸ“° News
- [2026.01.07] You can now fine-tuning Youtu-LLM with [ModelScope](https://mp.weixin.qq.com/s/JJtQWSYEjnE7GnPkaJ7UNA).
- [2026.01.04] You can now fine-tuning Youtu-LLM with [LlamaFactory](https://github.com/hiyouga/LlamaFactory/pull/9707).
<a id="benchmarks"></a>
## πŸ“Š Performance Comparisons
### Base Model
# <img src="assets/general_agentic_base.png" alt="Comparison between Youtu-LLM-2B-Base and baselines" height="260px">
#### General Benchmarks
| Type | Benchmark (Metric) | # Shots | Qwen3-1.7B-Base | SmoLM3-3B-Base | Gemma3-4B-Base | Qwen3-4B-Base | Llama3.1-8B | Youtu-LLM-2B-Base |
| :--- | :--- | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| Commonsense | MMLU-Pro (EM) | 5 | 34.9% | 35.3% | 29.4% | <u>46.1%</u> | 36.2% | **48.4%** |
| | MLQA-Zh (EM) | 3 | 38.1% | 38.0% | 40.3% | **47.2%** | 43.0% | <u>43.5%</u> |
| | MMLU-ProX-Zh (EM) | 5 | 32.5% | 26.7% | 24.2% | **45.2%** | 25.4% | <u>40.7%</u> |
| STEM | GSM8K (EM) | 8 | 68.2% | 67.3% | 38.5% | **80.8%** | 47.8% | <u>77.6%</u> |
| | MGSM-Zh (EM) | 8 | 57.1% | 40.7% | 33.0% | **69.7%** | 35.9% | <u>68.9%</u> |
| | MATH (EM) | 4 | 28.1% | 40.8% | 24.4% | **44.8%** | 21.5% | <u>44.4%</u> |
| | BBH (EM) | 3 | 53.0% | 59.8% | 51.6% | **70.8%** | <u>62.9%</u> | 59.8% |
| | GPQA-MC (Acc. Norm) | 5 | 30.4% | 26.6% | 28.6% | **37.8%** | 30.1% | <u>33.3%</u> |
| | HLE-MC (Acc. Norm) | 3 | 10.7% | 3.1% | 8.0% | <u>15.0%</u> | 11.5% | **17.4%** |
| Coding | MBPP (Pass@1) | 3 | 55.6% | 51.0% | 45.8% | **67.5%** | 49.4% | <u>66.6%</u> |
| | MBPP+ (Pass@1) | 3 | 71.0% | 66.1% | 61.9% | <u>80.8%</u> | 62.7% | **81.8%** |
| | HumanEval (Pass@1) | 0 | 49.9% | 34.8% | 36.6% | <u>57.6%</u> | 36.0% | **64.6%** |
| | HumanEval+ (Pass@1) | 0 | 41.3% | 28.1% | 28.1% | <u>49.9%</u> | 28.1% | **57.3%** |
| | LiveCodeBench v6 (Pass@1) | 3 | 5.1% | 2.9% | 2.9% | <u>6.9%</u> | 3.4% | **9.7%** |
| | CRUXEval (Pass@1) | 1 | 40.6% | 42.1% | 39.7% | <u>54.8%</u> | 42.3% | **55.9%** |
| | RepoBench (EM) | 3 | 21.0% | 21.8% | 23.0% | **25.3%** | <u>25.2%</u> | 22.7% |
| Long Context | LongBench v2 (Acc.) | 3 | <u>28.0%</u> | **28.8%** | 26.6% | 25.8% | 27.8% | 27.2% |
| | NIAH (Acc.) | / | 79.8% | 75.0% | <u>99.5%</u> | 83.0% | **99.8%** | 98.8% |
#### Agentic Benchmarks
We takes [APTBench](https://github.com/TencentYoutuResearch/APTBench/) for evaluating the agentic capabilities of base model.
| Category | Qwen3-1.7B-Base | SmoLM3-3B-Base | Gemma3-4B-Base | Qwen3-4B-Base | Llama3.1-8B | Youtu-LLM-2B-Base |
| :--- | :---: | :---: | :---: | :---: | :---: | :---: |
| Code | 25.1% | 24.3% | 32.8% | **41.9%** | 23.6% | <u>37.9%</u> |
| Deep Research | 28.5% | 27.2% | 36.4% | **40.5%** | 30.0% | <u>38.6%</u> |
| Math | 59.9% | 60.7% | 59.8% | **70.5%** | 60.1% | <u>68.0%</u> |
| Tool | 56.7% | 59.1% | 61.7% | **65.8%** | 64.1% | <u>64.2%</u> |
## πŸ“š Citation
If you find our work useful in your research, please consider citing the following paper:
```bibtex
@article{youtu-llm,
title={Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models},
author={Tencent Youtu Lab},
year={2025},
eprint={2512.24618},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.24618},
}
```