Qwen3-4B — Stage 2 (Reasoning Injection)
A 4B parameter model built on top of Stage 1, injected with high-quality Claude Opus reasoning traces and curated chain-of-thought data. This is Stage 2 of a multi-stage training pipeline — focused on deep reasoning, structured thinking, and multi-step problem solving.
Model Details
- Base Model: unsloth/Qwen3-4B
- Parameters: 4B
- Architecture: Qwen3 (dense, pure text)
- Training: Stage 1 LoRA → merge → Stage 2 LoRA → merge
- License: Apache 2.0
- Language: English (multilingual via base model)
Files
| File | Description |
|---|---|
adapter_model.safetensors |
Stage 2 LoRA adapter weights (unmerged, applied on top of S1) |
adapter_config.json |
LoRA configuration |
qwen4b-s2-Q4_K_M.gguf |
Quantized GGUF (Q4_K_M, ~2.4GB) |
tokenizer.json |
Tokenizer |
Training Details
Stage 1 → Stage 2 Pipeline
This model is trained in two sequential stages:
Stage 1 — General Foundation (~308k samples, 2 epochs) General chat, instruction following, math, coding, factual QA. See Qwen3-4B-Stage1 for details.
Stage 2 — Reasoning Injection (~5,576 samples, 2 epochs) High-quality Claude Opus reasoning traces and curated chain-of-thought data injected on top of Stage 1.
Stage 2 Dataset Mix
| Dataset | Samples | Purpose |
|---|---|---|
| TeichAI/claude-4.5-opus-high-reasoning-250x | 250 | Claude Opus 4.5 high reasoning traces |
| nohurry/Opus-4.6-Reasoning-3000x-filtered | 2,326 | Claude 4.6 Opus reasoning traces |
| bespokelabs/Bespoke-Stratos-17k | 3,000 | Curated chain-of-thought |
Total Stage 2: ~5,576 samples, 2 epochs
Stage 2 Hyperparameters
| Parameter | Value |
|---|---|
| LoRA rank | 32 |
| LoRA alpha | 32 |
| Learning rate | 5e-5 cosine |
| Epochs | 2 |
| Sequence length | 4096 |
| Batch size | 2 (effective 32) |
| Optimizer | adamw |
| Final loss | 0.5117 |
Hardware
Trained on AMD Instinct MI300X (192GB VRAM), ROCm 6.2.4, Unsloth 2026.3.3, PyTorch 2.7.1+rocm6.2.4. Stage 2 runtime: ~4.4 hours.
Usage
Ollama (GGUF)
ollama run hf.co/lqfdjbf32n/Qwen3-4B-Stage2:Q4_K_M
llama.cpp
llama-cli -m qwen4b-s2-Q4_K_M.gguf \
-p "<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\nYour question here<|im_end|>\n<|im_start|>assistant\n" \
-n 512
Python (LoRA adapter — requires Stage 1 first)
from unsloth import FastLanguageModel
from peft import PeftModel
import torch
# Load base
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Qwen3-4B",
max_seq_length=4096,
dtype=torch.bfloat16,
load_in_4bit=False,
)
# Apply Stage 1
model = PeftModel.from_pretrained(model, "lqfdjbf32n/Qwen3-4B-Stage1")
model = model.merge_and_unload()
# Apply Stage 2
model = PeftModel.from_pretrained(model, "lqfdjbf32n/Qwen3-4B-Stage2")
model = model.merge_and_unload()
Strengths
- Structured
<think>...</think>reasoning blocks for complex problems - Multi-step math and science problem solving
- Claude Opus reasoning style distilled into 4B parameters
- Natural casual conversation (inherited from Stage 1)
- Auto language detection (English/Indonesian)
Limitations
- Stage 2 only — requires Stage 1 as foundation
- English primary (multilingual via base model)
- Not suitable for production without validation
- Complex reasoning may still fail on hardest problems
Part of a Series
| Model | Description |
|---|---|
| Qwen2.5-0.5B-ReasonChat | 0.5B edge model, reasoning + chat merged |
| Qwen3-4B-Stage1 | 4B general foundation |
| Qwen3-4B-Stage2 | 4B + Claude reasoning injection (this model) |
| Qwen3-4B-Stage3 | 4B + alignment (coming soon) |
- Downloads last month
- 14
4-bit