Qwen3-0.6b-thinking

A Chain-of-Thought (CoT) Supervised Fine-Tuned (SFT) version of Qwen/Qwen3-0.6B, trained to reason step-by-step using explicit <think> / </think> reasoning traces before producing a final answer.


Training Details

Field Value
Base model Qwen/Qwen3-0.6B
Fine-tuning type Chain-of-Thought SFT (full fine-tune)
Training dataset PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts
Hardware 1 × NVIDIA H100 (Hopper)
Precision bfloat16
Attention PyTorch SDPA (attn_implementation="sdpa")
Context length 4096 tokens
Epochs 1
Effective batch size 128 (batch_size × gradient_accumulation)
Learning rate 2e-5 (cosine decay, 100 warm-up steps)
Optimizer AdamW (fused)
Weight decay 0.01
Max grad norm 1.0
Gradient checkpointing ✓ (use_reentrant=False)
torch.compile ✓ (Hopper optimisation)

Training Data

The model was trained on PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts, a high-quality mixture-of-thoughts dataset that contains rich step-by-step reasoning traces wrapped in <think>…</think> tags.

Each training example follows a multi-turn conversational format with explicit chain-of-thought reasoning in the assistant turn.


Prompt / Chat Template

This model uses a simple plain-text role-prefixed format:

user: <your question here>
assistant: <think>
<step-by-step reasoning>
</think>
<final answer>

Example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "PursuitOfDataScience/Qwen3-0.6b-thinking"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

question = "If a train travels 120 miles in 2 hours, how fast is it going in mph?"

# Build the prompt exactly as used during training
prompt = f"user: {question}\nassistant: <think>\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

The model will first reason inside <think>…</think> tags and then produce the final answer. For GSM8K-style problems the final answer is typically marked with #### <number>.


Label Masking

Only the assistant turn (everything after assistant:) is included in the loss computation. The user turn and system context are masked (-100) so the model learns to generate responses, not to predict the input.


GSM8K Benchmark (Pass@1)

Evaluated on the full GSM8K test set (1,319 problems) using vLLM with:

  • Temperature: 0.7
  • Top-p: 0.9
  • Max new tokens: 8192
  • Prompt format: user: {question}\nassistant: <think>\n

An answer is counted correct only if the model produces a structured numerical answer in one of the following formats that matches the gold answer:

  • #### X
  • The answer is X / Answer: X / Final answer: X
  • \boxed{X}

Results

Steps (checkpoint) Correct Total Accuracy
0 (base model — Qwen/Qwen3-0.6B) 379 1319 28.73%
500 478 1319 36.24%
1000 528 1319 40.03%
1500 524 1319 39.73%
2000 579 1319 43.90%
2500 548 1319 41.55%
Final model 562 1319 42.61%

The fine-tuned model consistently outperforms the base model, achieving a +13.88 percentage-point improvement (28.73% → 42.61%) on GSM8K Pass@1.


Limitations

  • Trained for only 1 epoch on a single H100; additional training may further improve accuracy.
  • The model is 0.6B parameters — reasoning depth is inherently limited compared to larger models.
  • Evaluation used strict structured-answer matching; the model may reach correct answers via slightly different phrasing that is not credited.
  • Not evaluated on non-English benchmarks.

Citation

If you use this model, please cite the base model and dataset:

@misc{qwen3-0.6b-thinking,
  author       = {PursuitOfDataScience},
  title        = {Qwen3-0.6b-thinking: Chain-of-Thought SFT on Qwen3-0.6B},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/PursuitOfDataScience/Qwen3-0.6b-thinking}},
  note         = {Fine-tuned from Qwen/Qwen3-0.6B on PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts}
}
Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for PursuitOfDataScience/Qwen3-0.6b-thinking

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(697)
this model

Dataset used to train PursuitOfDataScience/Qwen3-0.6b-thinking