Instructions to use dustarrr/reasoning-rob with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use dustarrr/reasoning-rob with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="dustarrr/reasoning-rob")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("dustarrr/reasoning-rob")
model = AutoModelForMultimodalLM.from_pretrained("dustarrr/reasoning-rob")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use dustarrr/reasoning-rob with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "dustarrr/reasoning-rob"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dustarrr/reasoning-rob",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/dustarrr/reasoning-rob

SGLang

How to use dustarrr/reasoning-rob with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "dustarrr/reasoning-rob" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dustarrr/reasoning-rob",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "dustarrr/reasoning-rob" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "dustarrr/reasoning-rob",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use dustarrr/reasoning-rob with Docker Model Runner:
```
docker model run hf.co/dustarrr/reasoning-rob
```

Reasoning Rob

A Qwen2.5-1.5B base model fine-tuned to reason with chain-of-thought traces from s1K + LIMO.

Summary


Base model	`Qwen/Qwen2.5-1.5B`
Parameters	~1.5B (LoRA r=16, merged)
Context length	2048 tokens
Training data	s1K (1,000 traces) + LIMO (817 traces) = ~1,800 CoT samples
Method	s1-style distillation + budget forcing via QLoRA SFT
Compute	Google Colab T4 GPU, ~16 min
Special tokens	`<think>` `</think>` for reasoning trace delimiters

Evaluation Results

Benchmark	Reasoning Rob
GSM8K (50 samples)	10.00%

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dustarrr/reasoning-rob",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dustarrr/reasoning-rob")
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant that thinks step by step."},
    {"role": "user", "content": "If a train travels 60 km in 1.5 hours, what is its speed?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(response)

Budget Forcing (s1-style)

Extend the model's thinking phase by injecting "Wait" before the </think> token to force longer reasoning before the final answer. This is the test-time scaling trick from the s1 paper.

Training Details

Hyperparameter	Value
LoRA rank	16
LoRA alpha	32
LoRA dropout	0.05
Learning rate	0.0001
LR scheduler	cosine
Warmup ratio	0.03
Weight decay	0.01
Batch size	2
Gradient accumulation	8
Max sequence length	2048
Epochs	1
Quantization	NF4 (4-bit, double quant)
Optimizer	adamw_torch

Attribution

Reasoning Rob is a QLoRA fine-tune of Qwen/Qwen2.5-1.5B (base, not instruct) trained on:

s1K - 1,000 curated reasoning traces
LIMO - 817 "Less Is More" reasoning traces

Using the s1 distillation + budget-forcing method and LIMO "less is more" reasoning transfer approach.

All credit to:

The Qwen Team (Alibaba) for the base model
The s1 authors (Stanford) for the training methodology and dataset
The LIMO authors (GAIR) for the reasoning dataset

This model would not exist without their work.

Limitations

Small model: At 1.5B parameters, Reasoning Rob has limited capacity.
Hallucination: The model may still produce incorrect reasoning or fabricate facts.
Short context: Max sequence length is 2048 tokens.
English only: Training data is predominantly English.

License

Apache 2.0 (inherited from Qwen2.5 base model).

Generated on 2026-06-23

Downloads last month: 19

Safetensors

Model size

2B params

Tensor type

F16

Model tree for dustarrr/reasoning-rob

Base model

Qwen/Qwen2.5-1.5B

Finetuned

(365)

this model

Datasets used to train dustarrr/reasoning-rob

Papers for dustarrr/reasoning-rob

LIMO: Less is More for Reasoning

Paper • 2502.03387 • Published Feb 5, 2025 • 63

s1: Simple test-time scaling

Paper • 2501.19393 • Published Jan 31, 2025 • 126