Instructions to use ByteDance/Ouro-1.4B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use ByteDance/Ouro-1.4B-Thinking with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="ByteDance/Ouro-1.4B-Thinking", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("ByteDance/Ouro-1.4B-Thinking", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use ByteDance/Ouro-1.4B-Thinking with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "ByteDance/Ouro-1.4B-Thinking"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance/Ouro-1.4B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/ByteDance/Ouro-1.4B-Thinking

SGLang

How to use ByteDance/Ouro-1.4B-Thinking with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "ByteDance/Ouro-1.4B-Thinking" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance/Ouro-1.4B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "ByteDance/Ouro-1.4B-Thinking" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "ByteDance/Ouro-1.4B-Thinking",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use ByteDance/Ouro-1.4B-Thinking with Docker Model Runner:
```
docker model run hf.co/ByteDance/Ouro-1.4B-Thinking
```

Improve model card: Update paper/code links and BibTeX citation

by nielsr HF Staff - opened Oct 31, 2025

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-9

nielsr

Oct 31, 2025

This PR enhances the model card for ByteDance/Ouro-1.4B-Thinking by:

Updating the "Paper" link under "Project Links" to point directly to the Hugging Face paper page (Scaling Latent Reasoning via Looped Language Models) for improved discoverability on the Hub.
Adding an explicit "Code" link under "Project Links" to the official GitHub repository (https://github.com/Ouro-LLM/Ouro), which is linked from the project page.
Updating the BibTeX citation to include the full list of authors and a direct link to the arXiv paper for better attribution and accuracy.

These changes make it easier for users to access the paper, code, and correctly cite the work.

Improve model card: Update paper/code links and BibTeX citation2a53da4b

ridger

ByteDance org Oct 31, 2025

Thank you for the PR!

ridger changed pull request status to merged Oct 31, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment