Qwen3-4B — Stage 2 (Reasoning Injection)

A 4B parameter model built on top of Stage 1, injected with high-quality Claude Opus reasoning traces and curated chain-of-thought data. This is Stage 2 of a multi-stage training pipeline — focused on deep reasoning, structured thinking, and multi-step problem solving.

Model Details

  • Base Model: unsloth/Qwen3-4B
  • Parameters: 4B
  • Architecture: Qwen3 (dense, pure text)
  • Training: Stage 1 LoRA → merge → Stage 2 LoRA → merge
  • License: Apache 2.0
  • Language: English (multilingual via base model)

Files

File Description
adapter_model.safetensors Stage 2 LoRA adapter weights (unmerged, applied on top of S1)
adapter_config.json LoRA configuration
qwen4b-s2-Q4_K_M.gguf Quantized GGUF (Q4_K_M, ~2.4GB)
tokenizer.json Tokenizer

Training Details

Stage 1 → Stage 2 Pipeline

This model is trained in two sequential stages:

Stage 1 — General Foundation (~308k samples, 2 epochs) General chat, instruction following, math, coding, factual QA. See Qwen3-4B-Stage1 for details.

Stage 2 — Reasoning Injection (~5,576 samples, 2 epochs) High-quality Claude Opus reasoning traces and curated chain-of-thought data injected on top of Stage 1.

Stage 2 Dataset Mix

Dataset Samples Purpose
TeichAI/claude-4.5-opus-high-reasoning-250x 250 Claude Opus 4.5 high reasoning traces
nohurry/Opus-4.6-Reasoning-3000x-filtered 2,326 Claude 4.6 Opus reasoning traces
bespokelabs/Bespoke-Stratos-17k 3,000 Curated chain-of-thought

Total Stage 2: ~5,576 samples, 2 epochs

Stage 2 Hyperparameters

Parameter Value
LoRA rank 32
LoRA alpha 32
Learning rate 5e-5 cosine
Epochs 2
Sequence length 4096
Batch size 2 (effective 32)
Optimizer adamw
Final loss 0.5117

Hardware

Trained on AMD Instinct MI300X (192GB VRAM), ROCm 6.2.4, Unsloth 2026.3.3, PyTorch 2.7.1+rocm6.2.4. Stage 2 runtime: ~4.4 hours.

Usage

Ollama (GGUF)

ollama run hf.co/lqfdjbf32n/Qwen3-4B-Stage2:Q4_K_M

llama.cpp

llama-cli -m qwen4b-s2-Q4_K_M.gguf \
    -p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nYour question here<|im_end|>\n<|im_start|>assistant\n" \
    -n 512

Python (LoRA adapter — requires Stage 1 first)

from unsloth import FastLanguageModel
from peft import PeftModel
import torch

# Load base
model, tokenizer = FastLanguageModel.from_pretrained(
    "unsloth/Qwen3-4B",
    max_seq_length=4096,
    dtype=torch.bfloat16,
    load_in_4bit=False,
)

# Apply Stage 1
model = PeftModel.from_pretrained(model, "lqfdjbf32n/Qwen3-4B-Stage1")
model = model.merge_and_unload()

# Apply Stage 2
model = PeftModel.from_pretrained(model, "lqfdjbf32n/Qwen3-4B-Stage2")
model = model.merge_and_unload()

Strengths

  • Structured <think>...</think> reasoning blocks for complex problems
  • Multi-step math and science problem solving
  • Claude Opus reasoning style distilled into 4B parameters
  • Natural casual conversation (inherited from Stage 1)
  • Auto language detection (English/Indonesian)

Limitations

  • Stage 2 only — requires Stage 1 as foundation
  • English primary (multilingual via base model)
  • Not suitable for production without validation
  • Complex reasoning may still fail on hardest problems

Part of a Series

Model Description
Qwen2.5-0.5B-ReasonChat 0.5B edge model, reasoning + chat merged
Qwen3-4B-Stage1 4B general foundation
Qwen3-4B-Stage2 4B + Claude reasoning injection (this model)
Qwen3-4B-Stage3 4B + alignment (coming soon)
Downloads last month
14
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lqfdjbf32n/Qwen3-4B-Stage2

Finetuned
Qwen/Qwen3-4B
Finetuned
unsloth/Qwen3-4B
Adapter
(20)
this model