| | --- |
| | language: |
| | - en |
| | - zh |
| | license: apache-2.0 |
| | pipeline_tag: text-generation |
| | tags: |
| | - reasoning |
| | - small-language-model |
| | - efficient-training |
| | - xmodel |
| | - xiaoduo-ai |
| | library_name: transformers |
| | --- |
| | |
| | # Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM |
| |
|
| | <h5 align="center"> |
| |
|
| | [](https://huggingface.co/XiaoduoAILab/Xmodel-2.5) |
| | [](https://arxiv.org/abs/2511.19496) |
| | [](https://github.com/XiaoduoAILab/Xmodel-2.5/blob/main/LICENSE) |
| | [](https://github.com/XiaoduoAILab/Xmodel-2.5) |
| | [](https://github.com/XiaoduoAILab/Xmodel-2.5) |
| |
|
| | </h5> |
| |
|
| | ## Model Description |
| |
|
| | Xmodel-2.5 is a 1.3 billion parameter small language model specifically designed as a **lightweight agent core** for complex reasoning tasks. The model builds upon Xmodel-2 with four key upgrades: |
| |
|
| | 1. **Full μP Support**: Extended Megatron-LM to support maximal update parameterization for reliable hyperparameter transfer |
| | 2. **Efficient Tokenizer**: Adopted 129K token DeepSeek-v3 tokenizer for improved compression rate and decoding speed |
| | 3. **FP8 Mixed Precision**: Used E4M3 forward and E5M2 backward FP8 formats to balance precision and throughput |
| | 4. **Optimizer Scheduling**: Switched from AdamW to Muon during decay phase, significantly improving downstream task performance |
| |
|
| | Trained with only 1.4T tokens, Xmodel-2.5 achieves **52.49%** average accuracy across 13 reasoning benchmarks, ranking second among 1-2B parameter models, only behind Qwen3 (56.96%) but with 25.7x fewer training tokens. |
| |
|
| | ## Model Architecture |
| |
|
| | | Hyperparameter | Value | |
| | |----------------|-------| |
| | | Hidden size | 1536 | |
| | | Intermediate size | 3840 | |
| | | Transformer layers | 48 | |
| | | Attention heads (Q) | 24 | |
| | | KV heads (GQA) | 8 | |
| | | Sequence length | 3712 | |
| | | Max position embeddings | 131072 | |
| | | RoPE base | 500000 | |
| |
|
| | ## Intended Uses & Limitations |
| |
|
| | ### Intended Uses |
| | - Complex reasoning tasks |
| | - Lightweight AI agent applications |
| | - Educational and research purposes |
| | - Resource-constrained environments |
| |
|
| | ### Limitations |
| | - Limited to 1.3B parameter capacity |
| | - May struggle with highly specialized domains |
| | - Performance may vary on non-English languages |
| |
|
| | ## Training Details |
| |
|
| | ### Training Strategy |
| | - **Three-stage WSD curriculum**: 560k steps, 1.4T tokens |
| | - **Warmup phase**: 2k steps, linear learning rate increase |
| | - **Stable phase**: 530k steps, gradually increasing batch size |
| | - **Decay phase**: 20k steps, mixing 66.9% high-quality SFT data |
| | - **Long-context adaptation**: 10k additional steps for 16K context support |
| |
|
| | ### Key Innovations |
| | - **μP hyperparameter transfer**: Direct transfer from 20M parameter proxy model to full model |
| | - **Optimizer switching**: AdamW → Muon during decay phase for improved reasoning performance |
| | - **FP8 mixed precision**: FP8 format significantly enhances training efficiency |
| |
|
| | ## Performance |
| |
|
| | ### Comprehensive Reasoning Performance |
| |
|
| | | Model | Parameters | Training Tokens | 13-Task Average | |
| | |-------|------------|-----------------|------------------| |
| | | Qwen3-1.7B | 1.7B | 36T | 56.96% | |
| | | **Xmodel-2.5** | **1.3B** | **1.4T** | **52.49%** | |
| | | InternLM2.5-1.8B | 1.8B | - | 50.19% | |
| | | Xmodel-2-1.2B | 1.2B | 1.5T | 50.34% | |
| | | MiniCPM-1B | 1B | - | 48.95% | |
| | | SmolLM2-1.7B | 1.7B | 11T | 46.88% | |
| | | Llama-3.2-1B | 1B | 9T | 44.72% | |
| |
|
| | ### Detailed Task Performance |
| |
|
| | | Task | Xmodel-2.5 | Xmodel-2 | Improvement | |
| | |------|------------|----------|-------------| |
| | | ARC-Challenge | 48.89 | 46.16 | +2.73 | |
| | | ARC-Easy | 76.94 | 76.22 | +0.72 | |
| | | PIQA | 75.95 | 75.14 | +0.81 | |
| | | HellaSwag | 67.24 | 64.05 | +3.19 | |
| | | WinoGrande | 64.64 | 64.25 | +0.39 | |
| | | BBH | 54.58 | 48.90 | +5.68 | |
| | | MMLU | 51.81 | 49.98 | +1.83 | |
| | | GSM8k | 58.98 | 56.56 | +2.42 | |
| | | MATH | 28.94 | 25.64 | +3.30 | |
| | | HumanEval | 28.66 | 29.27 | -0.61 | |
| | | MBPP | 33.00 | 30.80 | +2.20 | |
| | | CMMLU | 47.16 | 44.29 | +2.87 | |
| | | C-Eval | 45.54 | 43.16 | +2.38 | |
| |
|
| | ## How to Use |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import os |
| | |
| | model_path = "XiaoduoAILab/Xmodel-2.5" |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_path, |
| | torch_dtype="auto", |
| | device_map="auto", |
| | trust_remote_code=True |
| | ) |
| | tokenizer = AutoTokenizer.from_pretrained( |
| | model_path, |
| | trust_remote_code=True |
| | ) |
| | |
| | prompt = "Explain the concept of transfer learning in machine learning." |
| | messages = [{"role": "user", "content": prompt}] |
| | text = tokenizer.apply_chat_template( |
| | messages, |
| | tokenize=False, |
| | add_generation_prompt=True |
| | ) |
| | |
| | model_inputs = tokenizer(text, return_tensors="pt").to(model.device) |
| | |
| | # Generation configuration |
| | generated_ids = model.generate( |
| | **model_inputs, |
| | max_new_tokens=512, |
| | do_sample=True, |
| | top_p=0.9, |
| | temperature=0.7, |
| | pad_token_id=tokenizer.eos_token_id |
| | ) |
| | |
| | output = tokenizer.decode( |
| | generated_ids[0][len(model_inputs.input_ids[0]):], |
| | skip_special_tokens=True |
| | ) |
| | print("Generated Response:") |
| | print(output) |
| | ``` |
| |
|
| | ## Citation |
| |
|
| | If you find Xmodel-2.5 useful for your research or applications, please consider citing our work: |
| |
|
| | ```bibtex |
| | @misc{liu2025xmodel25, |
| | title={Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM}, |
| | author={Yang Liu and Xiaolong Zhong and Ling Jiang}, |
| | year={2025}, |
| | eprint={2511.19496}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.LG}, |
| | url={https://arxiv.org/abs/2511.19496}, |
| | } |
| | ``` |
| |
|
| | ## Contact |
| |
|
| | For questions or suggestions, please contact us through: |
| | - GitHub Issues: [Xmodel-2.5 Issues](https://github.com/XiaoduoAILab/Xmodel-2.5/issues) |
| | - Email: foamilu@yeah.net |
| |
|
| | ## License |
| |
|
| | This project is licensed under the Apache-2.0 License. See the [LICENSE](LICENSE) file for details. |