File size: 5,007 Bytes
cfe6f63 a2d3a54 cfe6f63 d5d2e61 cfe6f63 d5d2e61 cfe6f63 d5d2e61 dafed72 cfe6f63 dafed72 cfe6f63 dafed72 cfe6f63 dafed72 cfe6f63 a2d3a54 cfe6f63 a2d3a54 cfe6f63 a2d3a54 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 | ---
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
- looped-language-model
- reasoning
- recurrent-depth
- thinking
- chain-of-thought
---
# Ouro-2.6B-Thinking

## Model Description
**⚠️ IMPORTANT: This model is intended for research purposes only. It is provided as-is without warranties for production use. **
**Ouro-2.6B-Thinking** is a reasoning-specialized variant of the Ouro-2.6B base model, enhanced through supervised fine-tuning on high-quality reasoning data. Please use ``transformers==4.54.1``for compatibility.

## Key Features
- **Advanced Reasoning**: Specifically optimized for mathematical and scientific reasoning tasks
- **Compact Size**: Competitive with 4B models despite having only 2.6B parameters
- **Cross-Step Consistency**: Intermediate recurrent outputs can serve as reliable proxies for final answers
- **Explicit Thinking Process**: Trained to generate detailed reasoning steps
## Configuration
### Recurrent Steps and Adaptive Exit
The model's computational behavior can be configured through the `config.json` file:
```json
{
"total_ut_steps": 4,
"early_exit_threshold": 1.0
}
```
- **`total_ut_steps`**: Controls the number of recurrent steps (default: 4). You can adjust this value to trade off between performance and computation time.
- **`early_exit_threshold`**: Controls the adaptive exit mechanism (default: 1.0). Lower values encourage earlier exit, while 1.0 means always use all steps.
**Example: Modify recurrent steps**
```python
from transformers import AutoConfig, AutoModelForCausalLM
config = AutoConfig.from_pretrained("ByteDance/Ouro-2.6B-Thinking")
config.total_ut_steps = 3 # Use 3 recurrent steps instead of 4
model = AutoModelForCausalLM.from_pretrained(
"ByteDance/Ouro-2.6B-Thinking",
config=config,
device_map="auto"
)
```
> **Note**: vLLM does not currently support the adaptive exit feature due to its inference optimization characteristics. When using vLLM, the model will always execute the full number of `total_ut_steps`.
## Model Architecture
Based on Ouro-2.6B with additional reasoning fine-tuning:
| Configuration | Value |
|:---|:---|
| **Parameters** | 2.6B |
| **Layers** | 24 |
| **Recurrent Steps** | 4 |
| **Hidden Size** | 2048 |
| **Attention Heads** | Multi-Head Attention (MHA) |
| **FFN Activation** | SwiGLU |
| **Position Embedding** | RoPE |
| **Vocabulary Size** | 49,152 |
| **Context Length** | 32K (SFT) |
| **Normalization** | Sandwich RMSNorm |
## Training Details
### Pre-training
- **Training Tokens**: 7.7T tokens across 4 stages
- **Base Architecture**: Ouro-2.6B
### Supervised Fine-Tuning
- **Data Size**: ~8.3M examples
- **Data Composition**:
- Mathematics: 3.5M examples (OpenThoughts3, AceReason-1.1-SFT)
- Code: 3.2M examples (AceReason, OpenCodeReasoning, Llama-Nemotron, OpenThoughts3)
- Science: 808K examples (OpenThoughts3, Llama-Nemotron)
- Chat: 767K examples (DeepWriting-20K)
- **Training**: 2 epochs, max sequence length 32K
- **Optimizer**: Adam (lr=2×10⁻⁵, β=(0.9, 0.95))
- **Scheduler**: Cosine decay
## Quick Start
**⚠️ IMPORTANT**: Please use `transformers<4.56.0` to avoid compatibility issues. We recommend `transformers==4.54.1` or earlier versions.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "Bytedance/Ouro-2.6B-Thinking"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype="auto"
)
# Generate with reasoning
messages = [
{"role": "user", "content": "Solve: If 2x + 3 = 11, what is x?"}
]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=512, temperature=1.0, top_p=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
## Acknowledgments
We thank [@Antizana](https://github.com/Antizana) for the KV cache fix merged from [ouro-cache-fix](https://github.com/Antizana/ouro-cache-fix), which resolved a critical compatibility issue with transformers>=4.56.0.
## Citation
```bibtex
@article{zhu2025scaling,
title={Scaling Latent Reasoning via Looped Language Models},
author={Zhu, Rui-Jie and Wang, Zixuan and Hua, Kai and Zhang, Tianyu and Li, Ziniu and Que, Haoran and Wei, Boyi and Wen, Zixin and Yin, Fan and Xing, He and others},
journal={arXiv preprint arXiv:2510.25741},
year={2025}
}
## License
This model is licensed under Apache-2.0. See the LICENSE file for details.
## Project Links
- **Paper**: [Scaling Latent Reasoning via Looped Language Models](https://huggingface.co/papers/2510.25741)
- **Project Page**: [https://ouro-llm.github.io](https://ouro-llm.github.io)
- **Code**: [https://github.com/ByteDance/Ouro](https://github.com/ByteDance/Ouro)
--- |