Youtu-LLM-2B-Base / README.md

Update README.md (#8)

3b528d8 verified 15 days ago

5.4 kB

	---
	library_name: transformers
	license: other
	license_name: youtu-llm
	license_link: https://huggingface.co/tencent/Youtu-LLM-2B-Base/LICENSE.txt
	pipeline_tag: text-generation
	instruct_model:
	- tencent/Youtu-LLM-2B
	---
	<div align="center">

	# <img src="assets/youtu-llm-logo.png" alt="Youtu-LLM Logo" height="100px">

	[📃 License](LICENSE.txt) • [💻 Code](https://github.com/TencentCloudADP/youtu-tip/tree/master/youtu-llm) • [📑 Technical Report](https://arxiv.org/abs/2512.24618) • [📊 Benchmarks](#benchmarks)

	</div>

	## 🎯 Brief Introduction

	Youtu-LLM is a new, small, yet powerful LLM, contains only 1.96B parameters, supports 128k long context, and has native agentic talents. On general evaluations, Youtu-LLM significantly outperforms SOTA LLMs of similar size in terms of Commonsense, STEM, Coding and Long Context capabilities; in agent-related testing, Youtu-LLM surpasses larger-sized leaders and is truly capable of completing multiple end2end agent tasks.

	Youtu-LLM has the following features:
	- Type: Autoregressive Causal Language Models with Dense MLA
	- Release versions: [Base](https://huggingface.co/tencent/Youtu-LLM-2B-Base) and [Instruct](https://huggingface.co/tencent/Youtu-LLM-2B)
	- Number of Parameters: 1.96B
	- Number of Layers: 32
	- Number of Attention Heads (MLA): 16 for Q/K/V
	- MLA Rank: 1,536 for Q, 512 for K/V
	- MLA Dim: 128 for QK Nope, 64 for QK Rope, and 128 for V
	- Context Length: 131,072
	- Vocabulary Size: 128,256

	## 🤗 Model Download
	\| Model Name \| Description \| Download \|
	\| ----------- \| ----------- \|-----------
	\| Youtu-LLM-2B-Base \| Base model of Youtu-LLM-2B \|🤗 [Model](https://huggingface.co/tencent/Youtu-LLM-2B-Base)\|
	\| Youtu-LLM-2B \| Instruct model of Youtu-LLM-2B \| 🤗 [Model](https://huggingface.co/tencent/Youtu-LLM-2B)\|
	\| Youtu-LLM-2B-GGUF \| Instruct model of Youtu-LLM-2B, in GGUF format \| 🤗 [Model](https://huggingface.co/tencent/Youtu-LLM-2B-GGUF)\|

	## 📰 News
	- [2026.01.07] You can now fine-tuning Youtu-LLM with [ModelScope](https://mp.weixin.qq.com/s/JJtQWSYEjnE7GnPkaJ7UNA).
	- [2026.01.04] You can now fine-tuning Youtu-LLM with [LlamaFactory](https://github.com/hiyouga/LlamaFactory/pull/9707).

	<a id="benchmarks"></a>

	## 📊 Performance Comparisons
	### Base Model

	# <img src="assets/general_agentic_base.png" alt="Comparison between Youtu-LLM-2B-Base and baselines" height="260px">

	#### General Benchmarks
	\| Type \| Benchmark (Metric) \| # Shots \| Qwen3-1.7B-Base \| SmoLM3-3B-Base \| Gemma3-4B-Base \| Qwen3-4B-Base \| Llama3.1-8B \| Youtu-LLM-2B-Base \|
	\| :--- \| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Commonsense \| MMLU-Pro (EM) \| 5 \| 34.9% \| 35.3% \| 29.4% \| <u>46.1%</u> \| 36.2% \| 48.4% \|
	\| \| MLQA-Zh (EM) \| 3 \| 38.1% \| 38.0% \| 40.3% \| 47.2% \| 43.0% \| <u>43.5%</u> \|
	\| \| MMLU-ProX-Zh (EM) \| 5 \| 32.5% \| 26.7% \| 24.2% \| 45.2% \| 25.4% \| <u>40.7%</u> \|
	\| STEM \| GSM8K (EM) \| 8 \| 68.2% \| 67.3% \| 38.5% \| 80.8% \| 47.8% \| <u>77.6%</u> \|
	\| \| MGSM-Zh (EM) \| 8 \| 57.1% \| 40.7% \| 33.0% \| 69.7% \| 35.9% \| <u>68.9%</u> \|
	\| \| MATH (EM) \| 4 \| 28.1% \| 40.8% \| 24.4% \| 44.8% \| 21.5% \| <u>44.4%</u> \|
	\| \| BBH (EM) \| 3 \| 53.0% \| 59.8% \| 51.6% \| 70.8% \| <u>62.9%</u> \| 59.8% \|
	\| \| GPQA-MC (Acc. Norm) \| 5 \| 30.4% \| 26.6% \| 28.6% \| 37.8% \| 30.1% \| <u>33.3%</u> \|
	\| \| HLE-MC (Acc. Norm) \| 3 \| 10.7% \| 3.1% \| 8.0% \| <u>15.0%</u> \| 11.5% \| 17.4% \|
	\| Coding \| MBPP (Pass@1) \| 3 \| 55.6% \| 51.0% \| 45.8% \| 67.5% \| 49.4% \| <u>66.6%</u> \|
	\| \| MBPP+ (Pass@1) \| 3 \| 71.0% \| 66.1% \| 61.9% \| <u>80.8%</u> \| 62.7% \| 81.8% \|
	\| \| HumanEval (Pass@1) \| 0 \| 49.9% \| 34.8% \| 36.6% \| <u>57.6%</u> \| 36.0% \| 64.6% \|
	\| \| HumanEval+ (Pass@1) \| 0 \| 41.3% \| 28.1% \| 28.1% \| <u>49.9%</u> \| 28.1% \| 57.3% \|
	\| \| LiveCodeBench v6 (Pass@1) \| 3 \| 5.1% \| 2.9% \| 2.9% \| <u>6.9%</u> \| 3.4% \| 9.7% \|
	\| \| CRUXEval (Pass@1) \| 1 \| 40.6% \| 42.1% \| 39.7% \| <u>54.8%</u> \| 42.3% \| 55.9% \|
	\| \| RepoBench (EM) \| 3 \| 21.0% \| 21.8% \| 23.0% \| 25.3% \| <u>25.2%</u> \| 22.7% \|
	\| Long Context \| LongBench v2 (Acc.) \| 3 \| <u>28.0%</u> \| 28.8% \| 26.6% \| 25.8% \| 27.8% \| 27.2% \|
	\| \| NIAH (Acc.) \| / \| 79.8% \| 75.0% \| <u>99.5%</u> \| 83.0% \| 99.8% \| 98.8% \|

	#### Agentic Benchmarks
	We takes [APTBench](https://github.com/TencentYoutuResearch/APTBench/) for evaluating the agentic capabilities of base model.

	\| Category \| Qwen3-1.7B-Base \| SmoLM3-3B-Base \| Gemma3-4B-Base \| Qwen3-4B-Base \| Llama3.1-8B \| Youtu-LLM-2B-Base \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \| :---: \| :---: \|
	\| Code \| 25.1% \| 24.3% \| 32.8% \| 41.9% \| 23.6% \| <u>37.9%</u> \|
	\| Deep Research \| 28.5% \| 27.2% \| 36.4% \| 40.5% \| 30.0% \| <u>38.6%</u> \|
	\| Math \| 59.9% \| 60.7% \| 59.8% \| 70.5% \| 60.1% \| <u>68.0%</u> \|
	\| Tool \| 56.7% \| 59.1% \| 61.7% \| 65.8% \| 64.1% \| <u>64.2%</u> \|

	## 📚 Citation

	If you find our work useful in your research, please consider citing the following paper:

	```bibtex
	@article{youtu-llm,
	title={Youtu-LLM: Unlocking the Native Agentic Potential for Lightweight Large Language Models},
	author={Tencent Youtu Lab},
	year={2025},
	eprint={2512.24618},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2512.24618},
	}
	```