Arcade-3B — SmolReasoner

Arcade-3B is a 3B instruction-following and reasoning model built on SmolLM3-3B. It is the first public release from the ARCADE project at NoesisLab, which investigates the State–Constraint Orthogonality Hypothesis: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization.

Method: SC-Orthogonal Training

Standard Transformer hidden states conflate two distinct functions:

Half	Symbol	Role
`H[..., :D/2]`	S (State)	What the model knows — factual content
`H[..., D/2:]`	C (Constraint)	How to retrieve it — reasoning structure

ARCADE's SCOrthoTrainer injects an orthogonality penalty on the final hidden layer, encouraging S and C to decouple in representation space without modifying any attention operators:

$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \frac{\lambda}{B \cdot L} \sum_{b,l} \left( \mathbf{S}_{b,l} \cdot \mathbf{C}_{b,l} \right)^2$

with λ = 0.1. This soft regularization reduces divergence errors at inference time at zero architectural cost.

Training Details

Setting	Value
Base model	`HuggingFaceTB/SmolLM3-3B`
λ (orth penalty)	0.1
Max sequence length	2048
Learning rate	2e-4 (cosine)
Steps	10 000
Effective batch	16 sequences/step
Hardware	1 × A100-80 GB
Precision	bfloat16

Training Data

Dataset	Split	Sampling weight
nohurry/Opus-4.6-Reasoning-3000x-filtered	train (2.3 K)	10 %
HuggingFaceTB/smol-smoltalk	train (460 K)	45 %
OpenDataArena/ODA-Mixture-500k	train (500 K)	45 %

Reasoning samples are wrapped with <think>…</think> tags and upsampled 10× to compensate for the small dataset size.

Evaluation

Results from lm-evaluation-harness:

Comparison with Peer Models

< 10% entries are displayed as <10% in the chart.

Benchmark	Arcade-3B	Gemma-2-2B	Llama-2-7B	Qwen1.5-1.8B	OpenLLaMA-v2-3B
MMLU	52.9%	52.4%	45.3%	46.8%	41.0%
GSM8K	62.9%	50.9%	14.6%	37.8%	< 10%
HumanEval	41.5%	32.3%	12.8%	27.4%	< 10%
ARC-Challenge	52.6%	53.1%	46.2%	41.2%	34.2%
ARC-Easy	74.4%	75.9%	75.3%	66.8%	68.1%

Arcade-3B Detailed Scores

Benchmark	Few-shot	Metric	Score	±
GSM8K	5	flexible-extract / exact_match	0.6293	0.0133
HumanEval	0	pass@1	0.4146	0.0386
ARC-Challenge	25	acc_norm	0.5256	0.0146
ARC-Easy	0	acc	0.7437	0.0090
MMLU	0	acc	0.5293	0.0040

Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "NoesisLab/Arcade-3B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Solve step by step: If a train travels 120 km in 1.5 hours, what is its average speed?"}]
input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

output = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tok.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))

For step-by-step reasoning, the model may emit a <think>…</think> block before the final answer.

Citation

@misc{noesislab2025arcade,
  title        = {ARCADE: State-Constraint Orthogonal Training},
  author       = {NoesisLab},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/NoesisLab/Arcade-3B}},
}

License

Apache 2.0 — inherited from SmolLM3-3B.

Downloads last month: 12

Safetensors

Model size

3B params

Tensor type

BF16

Model tree for NoesisLab/Arcade-3B

Base model

HuggingFaceTB/SmolLM3-3B-Base

Finetuned

HuggingFaceTB/SmolLM3-3B

Finetuned

(141)

this model

Quantizations

2 models