Qwen3-0.6b-thinking

A Chain-of-Thought (CoT) Supervised Fine-Tuned (SFT) version of Qwen/Qwen3-0.6B, trained to reason step-by-step using explicit <think> / </think> reasoning traces before producing a final answer.

Training Details

Field	Value
Base model	Qwen/Qwen3-0.6B
Fine-tuning type	Chain-of-Thought SFT (full fine-tune)
Training dataset	PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts
Hardware	1 × NVIDIA H100 (Hopper)
Precision	bfloat16
Attention	PyTorch SDPA (`attn_implementation="sdpa"`)
Context length	4096 tokens
Epochs	1
Effective batch size	128 (batch_size × gradient_accumulation)
Learning rate	2e-5 (cosine decay, 100 warm-up steps)
Optimizer	AdamW (fused)
Weight decay	0.01
Max grad norm	1.0
Gradient checkpointing	✓ (`use_reentrant=False`)
torch.compile	✓ (Hopper optimisation)

Training Data

The model was trained on PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts, a high-quality mixture-of-thoughts dataset that contains rich step-by-step reasoning traces wrapped in <think>…</think> tags.

Each training example follows a multi-turn conversational format with explicit chain-of-thought reasoning in the assistant turn.

Prompt / Chat Template

This model uses a simple plain-text role-prefixed format:

user: <your question here>
assistant: <think>
<step-by-step reasoning>
</think>
<final answer>

Example

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "PursuitOfDataScience/Qwen3-0.6b-thinking"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

question = "If a train travels 120 miles in 2 hours, how fast is it going in mph?"

# Build the prompt exactly as used during training
prompt = f"user: {question}\nassistant: <think>\n"

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        temperature=0.7,
        top_p=0.9,
        do_sample=True,
    )

response = tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True)
print(response)

The model will first reason inside <think>…</think> tags and then produce the final answer. For GSM8K-style problems the final answer is typically marked with #### <number>.

Label Masking

Only the assistant turn (everything after assistant:) is included in the loss computation. The user turn and system context are masked (-100) so the model learns to generate responses, not to predict the input.

GSM8K Benchmark (Pass@1)

Evaluated on the full GSM8K test set (1,319 problems) using vLLM with:

Temperature: 0.7
Top-p: 0.9
Max new tokens: 8192
Prompt format: user: {question}\nassistant: <think>\n

An answer is counted correct only if the model produces a structured numerical answer in one of the following formats that matches the gold answer:

#### X
The answer is X / Answer: X / Final answer: X
\boxed{X}

Results

Steps (checkpoint)	Correct	Total	Accuracy
0 (base model — Qwen/Qwen3-0.6B)	379	1319	28.73%
500	478	1319	36.24%
1000	528	1319	40.03%
1500	524	1319	39.73%
2000	579	1319	43.90%
2500	548	1319	41.55%
Final model	562	1319	42.61%

The fine-tuned model consistently outperforms the base model, achieving a +13.88 percentage-point improvement (28.73% → 42.61%) on GSM8K Pass@1.

Limitations

Trained for only 1 epoch on a single H100; additional training may further improve accuracy.
The model is 0.6B parameters — reasoning depth is inherently limited compared to larger models.
Evaluation used strict structured-answer matching; the model may reach correct answers via slightly different phrasing that is not credited.
Not evaluated on non-English benchmarks.

Citation

If you use this model, please cite the base model and dataset:

@misc{qwen3-0.6b-thinking,
  author       = {PursuitOfDataScience},
  title        = {Qwen3-0.6b-thinking: Chain-of-Thought SFT on Qwen3-0.6B},
  year         = {2026},
  howpublished = {\url{https://huggingface.co/PursuitOfDataScience/Qwen3-0.6b-thinking}},
  note         = {Fine-tuned from Qwen/Qwen3-0.6B on PursuitOfDataScience/MiniMax-M2.1-Mixture-of-Thoughts}
}

Downloads last month: 2

Model tree for PursuitOfDataScience/Qwen3-0.6b-thinking

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-0.6B