Qwen3-0.6b-thinking
A Chain-of-Thought (CoT) Supervised Fine-Tuned (SFT) version of
Qwen/Qwen3-0.6B, trained to reason
step-by-step using explicit <think> / </think> reasoning traces before
producing a final answer.
Training Details
| Field | Value |
|---|---|
| Base model | Qwen/Qwen3-0.6B |
| Fine-tuning type | Chain-of-Thought SFT (full fine-tune) |
| Training dataset | PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts |
| Hardware | 1 × NVIDIA H100 (Hopper) |
| Precision | bfloat16 |
| Attention | PyTorch SDPA (attn_implementation="sdpa") |
| Context length | 4096 tokens |
| Epochs | 1 |
| Effective batch size | 128 (batch_size × gradient_accumulation) |
| Learning rate | 2e-5 (cosine decay, 100 warm-up steps) |
| Optimizer | AdamW (fused) |
| Weight decay | 0.01 |
| Max grad norm | 1.0 |
| Gradient checkpointing | ✓ (use_reentrant=False) |
| torch.compile | ✓ (Hopper optimisation) |
Training Data
The model was trained on
PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts,
a high-quality mixture-of-thoughts dataset that contains rich step-by-step
reasoning traces wrapped in <think>…</think> tags.
Each training example follows a multi-turn conversational format with explicit chain-of-thought reasoning in the assistant turn.
Prompt / Chat Template
This model uses a simple plain-text role-prefixed format:
user: <your question here>
assistant: <think>
<step-by-step reasoning>
</think>
<final answer>
Example
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "PursuitOfDataScience/Qwen3-0.6b-thinking"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
question = "If a train travels 120 miles in 2 hours, how fast is it going in mph?"
# Build the prompt exactly as used during training
prompt = f"user: {question}\nassistant: <think>\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.7,
top_p=0.9,
do_sample=True,
)
response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)
The model will first reason inside <think>…</think> tags and then produce the
final answer. For GSM8K-style problems the final answer is typically marked with
#### <number>.
Label Masking
Only the assistant turn (everything after assistant:) is included in the
loss computation. The user turn and system context are masked (-100) so the
model learns to generate responses, not to predict the input.
GSM8K Benchmark (Pass@1)
Evaluated on the full GSM8K test set (1,319 problems) using vLLM with:
- Temperature: 0.7
- Top-p: 0.9
- Max new tokens: 8192
- Prompt format:
user: {question}\nassistant: <think>\n
An answer is counted correct only if the model produces a structured numerical answer in one of the following formats that matches the gold answer:
#### XThe answer is X/Answer: X/Final answer: X\boxed{X}
Results
| Steps (checkpoint) | Correct | Total | Accuracy |
|---|---|---|---|
| 0 (base model — Qwen/Qwen3-0.6B) | 379 | 1319 | 28.73% |
| 500 | 478 | 1319 | 36.24% |
| 1000 | 528 | 1319 | 40.03% |
| 1500 | 524 | 1319 | 39.73% |
| 2000 | 579 | 1319 | 43.90% |
| 2500 | 548 | 1319 | 41.55% |
| Final model | 562 | 1319 | 42.61% |
The fine-tuned model consistently outperforms the base model, achieving a +13.88 percentage-point improvement (28.73% → 42.61%) on GSM8K Pass@1.
Limitations
- Trained for only 1 epoch on a single H100; additional training may further improve accuracy.
- The model is 0.6B parameters — reasoning depth is inherently limited compared to larger models.
- Evaluation used strict structured-answer matching; the model may reach correct answers via slightly different phrasing that is not credited.
- Not evaluated on non-English benchmarks.
Citation
If you use this model, please cite the base model and dataset:
@misc{qwen3-0.6b-thinking,
author = {PursuitOfDataScience},
title = {Qwen3-0.6b-thinking: Chain-of-Thought SFT on Qwen3-0.6B},
year = {2026},
howpublished = {\url{https://huggingface.co/PursuitOfDataScience/Qwen3-0.6b-thinking}},
note = {Fine-tuned from Qwen/Qwen3-0.6B on PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts}
}
- Downloads last month
- 14