Qwen3.5-0.8B-Opus-4.6-thinking

A two-stage Chain-of-Thought fine-tuned model based on Qwen/Qwen3.5-0.8B-Base. The model is trained to reason step-by-step inside <think> tags before producing a final answer.


Training Lineage

Qwen/Qwen3.5-0.8B-Base
        β”‚
        β”‚  Stage 1 β€” CoT SFT on 0.5M-thinking
        β”‚  244,997 examples Β· 1 epoch Β· 7,657 steps
        β–Ό
PursuitOfDataScience/Qwen3.5-0.8B-thinking   (GSM8K: 62.40%)
        β”‚
        β”‚  Stage 2 β€” Continued CoT SFT on Opus-4.6-Reasoning
        β”‚  2,326 examples Β· 3 epochs Β· 219 steps
        β–Ό
PursuitOfDataScience/Qwen3.5-0.8B-Opus-4.6-thinking   (this model)

Model Details

Attribute Value
Architecture Qwen3_5ForCausalLM
Parameters ~0.8B
Hidden size 1,024
Layers 24
Attention heads 8 (2 KV heads, GQA)
Vocabulary 248,320 tokens
Max position embeddings 262,144
Context window (training) 4,096 tokens
Precision bfloat16

Stage 1 β€” CoT SFT on 0.5M-thinking

Base model: Qwen/Qwen3.5-0.8B-Base
Dataset: PursuitOfDataScience/0.5M-thinking

A broad CoT fine-tuning pass over ~500K examples covering general reasoning, mathematics, and commonsense problems. After filtering examples that exceeded the 4,096-token context window, 244,997 examples were used.

Hyperparameter Value
Epochs 1
Per-device batch size 4
Gradient accumulation 8
Effective batch size 32
Learning rate 2e-5
LR schedule Cosine with warmup
Warmup steps 100
Total optimizer steps 7,657
Hardware 1Γ— H100 GPU
Precision bfloat16

GSM8K result after Stage 1: 62.40% (vs. 58.23% for the base model with <think> prompting).


Stage 2 β€” Continued CoT SFT on Opus-4.6-Reasoning

Base model: PursuitOfDataScience/Qwen3.5-0.8B-thinking
Dataset: nohurry/Opus-4.6-Reasoning-3000x-filtered

A focused continued fine-tuning pass on 2,326 high-quality examples distilled from Claude Opus 4.6, covering challenging multi-step reasoning problems. All examples were pre-filtered so that the complete sequence (prompt + full chain-of-thought + answer) fits within the 4,096-token window β€” no truncation.

The dataset provides three flat fields per example:

  • problem β€” the question / task
  • thinking β€” the full chain-of-thought reasoning
  • solution β€” the concise final answer

Prompt format (same as Stage 1):

user: <problem>
assistant: <think>
<thinking content>
</think>
<solution content>

The <think> tag is hardcoded into the prompt prefix so the model always learns to emit structured reasoning first. Only the assistant response (tokens after assistant: <think>\n) contributes to the cross-entropy loss.

Hyperparameter Value
Epochs 3
Per-device batch size 6
Gradient accumulation 5
Effective batch size 30
Learning rate 1e-5
LR schedule Cosine with warmup
Warmup steps 50
Total optimizer steps 219
Hardware 1Γ— H100 GPU
Precision bfloat16

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "PursuitOfDataScience/Qwen3.5-0.8B-Opus-4.6-thinking"

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

question = "If Alice has 3 apples and buys 5 more, how many apples does she have?"

prompt = (
    f"user: {question}\n"
    f"assistant: <think>\n"
)

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=1024,
    temperature=0.6,
    top_p=0.9,
    do_sample=True,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Acknowledgements


License

Apache 2.0 β€” same as the base model.

Downloads last month
7
Safetensors
Model size
0.8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for PursuitOfDataScience/Qwen3.5-0.8B-Opus-4.6-thinking

Finetuned
(1)
this model

Datasets used to train PursuitOfDataScience/Qwen3.5-0.8B-Opus-4.6-thinking