File size: 5,912 Bytes
01229df
 
 
 
b3e530f
5f31583
01229df
 
b3e530f
 
 
 
5f31583
01229df
 
b3e530f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
01229df
 
b3e530f
 
01229df
b3e530f
01229df
b3e530f
 
 
 
 
 
 
 
01229df
 
b3e530f
 
 
 
 
 
 
01229df
b3e530f
01229df
b3e530f
 
 
 
 
 
 
 
 
01229df
b3e530f
 
 
 
 
 
01229df
 
b3e530f
01229df
b3e530f
01229df
 
b3e530f
 
 
 
 
 
 
 
01229df
 
 
b3e530f
01229df
b3e530f
 
 
01229df
b3e530f
01229df
5f31583
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
---
language:
- en
- zh
license: apache-2.0
pipeline_tag: text-generation
tags:
- reasoning
- small-language-model
- efficient-training
- xmodel
- xiaoduo-ai
library_name: transformers
---

# Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

<h5 align="center">

[![hf_space](https://img.shields.io/badge/🤗-Xiaoduo%20HuggingFace-blue.svg)](https://huggingface.co/XiaoduoAILab/Xmodel-2.5)
[![arXiv](https://img.shields.io/badge/Arxiv-2511.19496-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2511.19496) 
[![Code License](https://img.shields.io/badge/Code%20License-Apache_2.0-green.svg)](https://github.com/XiaoduoAILab/Xmodel-2.5/blob/main/LICENSE)
[![github](https://img.shields.io/badge/-Github-black?logo=github)](https://github.com/XiaoduoAILab/Xmodel-2.5)
[![github](https://img.shields.io/github/stars/XiaoduoAILab/Xmodel-2.5.svg?style=social)](https://github.com/XiaoduoAILab/Xmodel-2.5)  

</h5>

## Model Description

Xmodel-2.5 is a 1.3 billion parameter small language model specifically designed as a **lightweight agent core** for complex reasoning tasks. The model builds upon Xmodel-2 with four key upgrades:

1. **Full μP Support**: Extended Megatron-LM to support maximal update parameterization for reliable hyperparameter transfer
2. **Efficient Tokenizer**: Adopted 129K token DeepSeek-v3 tokenizer for improved compression rate and decoding speed
3. **FP8 Mixed Precision**: Used E4M3 forward and E5M2 backward FP8 formats to balance precision and throughput
4. **Optimizer Scheduling**: Switched from AdamW to Muon during decay phase, significantly improving downstream task performance

Trained with only 1.4T tokens, Xmodel-2.5 achieves **52.49%** average accuracy across 13 reasoning benchmarks, ranking second among 1-2B parameter models, only behind Qwen3 (56.96%) but with 25.7x fewer training tokens.

## Model Architecture

| Hyperparameter | Value |
|----------------|-------|
| Hidden size | 1536 |
| Intermediate size | 3840 |
| Transformer layers | 48 |
| Attention heads (Q) | 24 |
| KV heads (GQA) | 8 |
| Sequence length | 3712 |
| Max position embeddings | 131072 |
| RoPE base | 500000 |

## Intended Uses & Limitations

### Intended Uses
- Complex reasoning tasks
- Lightweight AI agent applications
- Educational and research purposes
- Resource-constrained environments

### Limitations
- Limited to 1.3B parameter capacity
- May struggle with highly specialized domains
- Performance may vary on non-English languages

## Training Details

### Training Strategy
- **Three-stage WSD curriculum**: 560k steps, 1.4T tokens
- **Warmup phase**: 2k steps, linear learning rate increase
- **Stable phase**: 530k steps, gradually increasing batch size
- **Decay phase**: 20k steps, mixing 66.9% high-quality SFT data
- **Long-context adaptation**: 10k additional steps for 16K context support

### Key Innovations
- **μP hyperparameter transfer**: Direct transfer from 20M parameter proxy model to full model
- **Optimizer switching**: AdamW → Muon during decay phase for improved reasoning performance
- **FP8 mixed precision**: FP8 format significantly enhances training efficiency

## Performance

### Comprehensive Reasoning Performance

| Model | Parameters | Training Tokens | 13-Task Average |
|-------|------------|-----------------|------------------|
| Qwen3-1.7B | 1.7B | 36T | 56.96% |
| **Xmodel-2.5** | **1.3B** | **1.4T** | **52.49%** |
| InternLM2.5-1.8B | 1.8B | - | 50.19% |
| Xmodel-2-1.2B | 1.2B | 1.5T | 50.34% |
| MiniCPM-1B | 1B | - | 48.95% |
| SmolLM2-1.7B | 1.7B | 11T | 46.88% |
| Llama-3.2-1B | 1B | 9T | 44.72% |

### Detailed Task Performance

| Task | Xmodel-2.5 | Xmodel-2 | Improvement |
|------|------------|----------|-------------|
| ARC-Challenge | 48.89 | 46.16 | +2.73 |
| ARC-Easy | 76.94 | 76.22 | +0.72 |
| PIQA | 75.95 | 75.14 | +0.81 |
| HellaSwag | 67.24 | 64.05 | +3.19 |
| WinoGrande | 64.64 | 64.25 | +0.39 |
| BBH | 54.58 | 48.90 | +5.68 |
| MMLU | 51.81 | 49.98 | +1.83 |
| GSM8k | 58.98 | 56.56 | +2.42 |
| MATH | 28.94 | 25.64 | +3.30 |
| HumanEval | 28.66 | 29.27 | -0.61 |
| MBPP | 33.00 | 30.80 | +2.20 |
| CMMLU | 47.16 | 44.29 | +2.87 |
| C-Eval | 45.54 | 43.16 | +2.38 |

## How to Use

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import os

model_path = "XiaoduoAILab/Xmodel-2.5"
model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    model_path,
    trust_remote_code=True
)

prompt = "Explain the concept of transfer learning in machine learning."
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer(text, return_tensors="pt").to(model.device)

# Generation configuration
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    pad_token_id=tokenizer.eos_token_id
)

output = tokenizer.decode(
    generated_ids[0][len(model_inputs.input_ids[0]):], 
    skip_special_tokens=True
)
print("Generated Response:")
print(output)
```

## Citation

If you find Xmodel-2.5 useful for your research or applications, please consider citing our work:

```bibtex
@misc{liu2025xmodel25,
      title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM}, 
      author={Yang Liu and Xiaolong Zhong and Ling Jiang},
      year={2025},
      eprint={2511.19496},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2511.19496}, 
}
```

## Contact

For questions or suggestions, please contact us through:
- GitHub Issues: [Xmodel-2.5 Issues](https://github.com/XiaoduoAILab/Xmodel-2.5/issues)
- Email: foamilu@yeah.net

## License

This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details.