Reasoning Rob

Reasoning Rob

A Qwen2.5-1.5B base model fine-tuned to reason with chain-of-thought traces from s1K + LIMO.

Model Size Base Model License Downloads Likes


Summary

Base model Qwen/Qwen2.5-1.5B
Parameters ~1.5B (LoRA r=16, merged)
Context length 2048 tokens
Training data s1K (1,000 traces) + LIMO (817 traces) = ~1,800 CoT samples
Method s1-style distillation + budget forcing via QLoRA SFT
Compute Google Colab T4 GPU, ~16 min
Special tokens <think> </think> for reasoning trace delimiters

Evaluation Results

Benchmark Reasoning Rob
GSM8K (50 samples) 10.00%

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "dustarrr/reasoning-rob",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("dustarrr/reasoning-rob")
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful assistant that thinks step by step."},
    {"role": "user", "content": "If a train travels 60 km in 1.5 hours, what is its speed?"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(**inputs, max_new_tokens=1024, do_sample=False)

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=False)
print(response)

Budget Forcing (s1-style)

Extend the model's thinking phase by injecting "Wait" before the </think> token to force longer reasoning before the final answer. This is the test-time scaling trick from the s1 paper.


Training Details

Hyperparameter Value
LoRA rank 16
LoRA alpha 32
LoRA dropout 0.05
Learning rate 0.0001
LR scheduler cosine
Warmup ratio 0.03
Weight decay 0.01
Batch size 2
Gradient accumulation 8
Max sequence length 2048
Epochs 1
Quantization NF4 (4-bit, double quant)
Optimizer adamw_torch

Attribution

Reasoning Rob is a QLoRA fine-tune of Qwen/Qwen2.5-1.5B (base, not instruct) trained on:

  • s1K - 1,000 curated reasoning traces
  • LIMO - 817 "Less Is More" reasoning traces

Using the s1 distillation + budget-forcing method and LIMO "less is more" reasoning transfer approach.

All credit to:

  • The Qwen Team (Alibaba) for the base model
  • The s1 authors (Stanford) for the training methodology and dataset
  • The LIMO authors (GAIR) for the reasoning dataset

This model would not exist without their work.


Limitations

  • Small model: At 1.5B parameters, Reasoning Rob has limited capacity.
  • Hallucination: The model may still produce incorrect reasoning or fabricate facts.
  • Short context: Max sequence length is 2048 tokens.
  • English only: Training data is predominantly English.

License

Apache 2.0 (inherited from Qwen2.5 base model).


Generated on 2026-06-23

Downloads last month
19
Safetensors
Model size
2B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dustarrr/reasoning-rob

Finetuned
(365)
this model

Datasets used to train dustarrr/reasoning-rob

Papers for dustarrr/reasoning-rob