File size: 5,912 Bytes
01229df b3e530f 5f31583 01229df b3e530f 5f31583 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df b3e530f 01229df 5f31583 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 | ---
language:
- en
- zh
license: apache-2.0
pipeline_tag: text-generation
tags:
- reasoning
- small-language-model
- efficient-training
- xmodel
- xiaoduo-ai
library_name: transformers
---
# Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM
<h5 align="center">
[](https://huggingface.co/XiaoduoAILab/Xmodel-2.5)
[](https://arxiv.org/abs/2511.19496)
[](https://github.com/XiaoduoAILab/Xmodel-2.5/blob/main/LICENSE)
[](https://github.com/XiaoduoAILab/Xmodel-2.5)
[](https://github.com/XiaoduoAILab/Xmodel-2.5)
</h5>
## Model Description
Xmodel-2.5 is a 1.3 billion parameter small language model specifically designed as a **lightweight agent core** for complex reasoning tasks. The model builds upon Xmodel-2 with four key upgrades:
1. **Full μP Support**: Extended Megatron-LM to support maximal update parameterization for reliable hyperparameter transfer
2. **Efficient Tokenizer**: Adopted 129K token DeepSeek-v3 tokenizer for improved compression rate and decoding speed
3. **FP8 Mixed Precision**: Used E4M3 forward and E5M2 backward FP8 formats to balance precision and throughput
4. **Optimizer Scheduling**: Switched from AdamW to Muon during decay phase, significantly improving downstream task performance
Trained with only 1.4T tokens, Xmodel-2.5 achieves **52.49%** average accuracy across 13 reasoning benchmarks, ranking second among 1-2B parameter models, only behind Qwen3 (56.96%) but with 25.7x fewer training tokens.
## Model Architecture
| Hyperparameter | Value |
|----------------|-------|
| Hidden size | 1536 |
| Intermediate size | 3840 |
| Transformer layers | 48 |
| Attention heads (Q) | 24 |
| KV heads (GQA) | 8 |
| Sequence length | 3712 |
| Max position embeddings | 131072 |
| RoPE base | 500000 |
## Intended Uses & Limitations
### Intended Uses
- Complex reasoning tasks
- Lightweight AI agent applications
- Educational and research purposes
- Resource-constrained environments
### Limitations
- Limited to 1.3B parameter capacity
- May struggle with highly specialized domains
- Performance may vary on non-English languages
## Training Details
### Training Strategy
- **Three-stage WSD curriculum**: 560k steps, 1.4T tokens
- **Warmup phase**: 2k steps, linear learning rate increase
- **Stable phase**: 530k steps, gradually increasing batch size
- **Decay phase**: 20k steps, mixing 66.9% high-quality SFT data
- **Long-context adaptation**: 10k additional steps for 16K context support
### Key Innovations
- **μP hyperparameter transfer**: Direct transfer from 20M parameter proxy model to full model
- **Optimizer switching**: AdamW → Muon during decay phase for improved reasoning performance
- **FP8 mixed precision**: FP8 format significantly enhances training efficiency
## Performance
### Comprehensive Reasoning Performance
| Model | Parameters | Training Tokens | 13-Task Average |
|-------|------------|-----------------|------------------|
| Qwen3-1.7B | 1.7B | 36T | 56.96% |
| **Xmodel-2.5** | **1.3B** | **1.4T** | **52.49%** |
| InternLM2.5-1.8B | 1.8B | - | 50.19% |
| Xmodel-2-1.2B | 1.2B | 1.5T | 50.34% |
| MiniCPM-1B | 1B | - | 48.95% |
| SmolLM2-1.7B | 1.7B | 11T | 46.88% |
| Llama-3.2-1B | 1B | 9T | 44.72% |
### Detailed Task Performance
| Task | Xmodel-2.5 | Xmodel-2 | Improvement |
|------|------------|----------|-------------|
| ARC-Challenge | 48.89 | 46.16 | +2.73 |
| ARC-Easy | 76.94 | 76.22 | +0.72 |
| PIQA | 75.95 | 75.14 | +0.81 |
| HellaSwag | 67.24 | 64.05 | +3.19 |
| WinoGrande | 64.64 | 64.25 | +0.39 |
| BBH | 54.58 | 48.90 | +5.68 |
| MMLU | 51.81 | 49.98 | +1.83 |
| GSM8k | 58.98 | 56.56 | +2.42 |
| MATH | 28.94 | 25.64 | +3.30 |
| HumanEval | 28.66 | 29.27 | -0.61 |
| MBPP | 33.00 | 30.80 | +2.20 |
| CMMLU | 47.16 | 44.29 | +2.87 |
| C-Eval | 45.54 | 43.16 | +2.38 |
## How to Use
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import os
model_path = "XiaoduoAILab/Xmodel-2.5"
model = AutoModelForCausalLM.from_pretrained(
model_path,
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
model_path,
trust_remote_code=True
)
prompt = "Explain the concept of transfer learning in machine learning."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
model_inputs = tokenizer(text, return_tensors="pt").to(model.device)
# Generation configuration
generated_ids = model.generate(
**model_inputs,
max_new_tokens=512,
do_sample=True,
top_p=0.9,
temperature=0.7,
pad_token_id=tokenizer.eos_token_id
)
output = tokenizer.decode(
generated_ids[0][len(model_inputs.input_ids[0]):],
skip_special_tokens=True
)
print("Generated Response:")
print(output)
```
## Citation
If you find Xmodel-2.5 useful for your research or applications, please consider citing our work:
```bibtex
@misc{liu2025xmodel25,
title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM},
author={Yang Liu and Xiaolong Zhong and Ling Jiang},
year={2025},
eprint={2511.19496},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2511.19496},
}
```
## Contact
For questions or suggestions, please contact us through:
- GitHub Issues: [Xmodel-2.5 Issues](https://github.com/XiaoduoAILab/Xmodel-2.5/issues)
- Email: foamilu@yeah.net
## License
This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details. |