Text Generation
Transformers
Safetensors
ouro
looped-language-model
reasoning
recurrent-depth
thinking
chain-of-thought
conversational
custom_code
Instructions to use ByteDance/Ouro-1.4B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use ByteDance/Ouro-1.4B-Thinking with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="ByteDance/Ouro-1.4B-Thinking", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModelForCausalLM model = AutoModelForCausalLM.from_pretrained("ByteDance/Ouro-1.4B-Thinking", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use ByteDance/Ouro-1.4B-Thinking with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "ByteDance/Ouro-1.4B-Thinking" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance/Ouro-1.4B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/ByteDance/Ouro-1.4B-Thinking
- SGLang
How to use ByteDance/Ouro-1.4B-Thinking with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "ByteDance/Ouro-1.4B-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance/Ouro-1.4B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "ByteDance/Ouro-1.4B-Thinking" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "ByteDance/Ouro-1.4B-Thinking", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use ByteDance/Ouro-1.4B-Thinking with Docker Model Runner:
docker model run hf.co/ByteDance/Ouro-1.4B-Thinking
Improve model card: Update paper/code links and BibTeX citation
#1
by nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
---
|
|
|
|
| 2 |
license: apache-2.0
|
| 3 |
pipeline_tag: text-generation
|
| 4 |
-
library_name: transformers
|
| 5 |
tags:
|
| 6 |
- looped-language-model
|
| 7 |
- reasoning
|
|
@@ -130,11 +130,12 @@ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
|
| 130 |
## Citation
|
| 131 |
|
| 132 |
```bibtex
|
| 133 |
-
@article{
|
| 134 |
title={Scaling Latent Reasoning via Looped Language Models},
|
| 135 |
-
author={Zhu, Rui-Jie and Wang, Zixuan and Hua, Kai and Zhang, Tianyu and Li, Ziniu and Que, Haoran and Wei
|
| 136 |
-
journal={arXiv preprint},
|
| 137 |
-
year={2025}
|
|
|
|
| 138 |
}
|
| 139 |
```
|
| 140 |
|
|
@@ -144,9 +145,8 @@ This model is licensed under Apache-2.0. See the LICENSE file for details.
|
|
| 144 |
|
| 145 |
## Project Links
|
| 146 |
|
| 147 |
-
- **Paper**: [Scaling Latent Reasoning via Looped Language Models](https://
|
|
|
|
| 148 |
- **Project Page**: [https://ouro-llm.github.io](https://ouro-llm.github.io)
|
| 149 |
|
| 150 |
-
---
|
| 151 |
-
|
| 152 |
-
|
|
|
|
| 1 |
---
|
| 2 |
+
library_name: transformers
|
| 3 |
license: apache-2.0
|
| 4 |
pipeline_tag: text-generation
|
|
|
|
| 5 |
tags:
|
| 6 |
- looped-language-model
|
| 7 |
- reasoning
|
|
|
|
| 130 |
## Citation
|
| 131 |
|
| 132 |
```bibtex
|
| 133 |
+
@article{zhu2025scaling,
|
| 134 |
title={Scaling Latent Reasoning via Looped Language Models},
|
| 135 |
+
author={Zhu, Rui-Jie and Wang, Zixuan and Hua, Kai and Zhang, Tianyu and Li, Ziniu and Que, Haoran and Boyi Wei and Zixin Wen and Fan Yin and He Xing and Lu Li and Jiajun Shi and Kaijing Ma and Shanda Li and Taylor Kergan and Andrew Smith and Xingwei Qu and Mude Hui and Bohong Wu and Qiyang Min and Hongzhi Huang and Xun Zhou and Wei Ye and Jiaheng Liu and Jian Yang and Yunfeng Shi and Chenghua Lin and Enduo Zhao and Tianle Cai and Ge Zhang and Wenhao Huang and Yoshua Bengio and Jason Eshraghian},
|
| 136 |
+
journal={arXiv preprint arXiv:2510.25741},
|
| 137 |
+
year={2025},
|
| 138 |
+
url={https://arxiv.org/abs/2510.25741},
|
| 139 |
}
|
| 140 |
```
|
| 141 |
|
|
|
|
| 145 |
|
| 146 |
## Project Links
|
| 147 |
|
| 148 |
+
- **Paper**: [Scaling Latent Reasoning via Looped Language Models](https://huggingface.co/papers/2510.25741)
|
| 149 |
+
- **Code**: [https://github.com/Ouro-LLM/Ouro](https://github.com/Ouro-LLM/Ouro)
|
| 150 |
- **Project Page**: [https://ouro-llm.github.io](https://ouro-llm.github.io)
|
| 151 |
|
| 152 |
+
---
|
|
|
|
|
|