Arcade-3B — SmolReasoner
Arcade-3B is a 3B instruction-following and reasoning model built on SmolLM3-3B. It is the first public release from the ARCADE project at NoesisLab, which investigates the State–Constraint Orthogonality Hypothesis: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization.
Method: SC-Orthogonal Training
Standard Transformer hidden states conflate two distinct functions:
| Half | Symbol | Role |
|---|---|---|
H[..., :D/2] |
S (State) | What the model knows — factual content |
H[..., D/2:] |
C (Constraint) | How to retrieve it — reasoning structure |
ARCADE's SCOrthoTrainer injects an orthogonality penalty on the final hidden layer, encouraging S and C to decouple in representation space without modifying any attention operators:
with λ = 0.1. This soft regularization reduces divergence errors at inference time at zero architectural cost.
Training Details
| Setting | Value |
|---|---|
| Base model | HuggingFaceTB/SmolLM3-3B |
| λ (orth penalty) | 0.1 |
| Max sequence length | 2048 |
| Learning rate | 2e-4 (cosine) |
| Steps | 10 000 |
| Effective batch | 16 sequences/step |
| Hardware | 1 × A100-80 GB |
| Precision | bfloat16 |
Training Data
| Dataset | Split | Sampling weight |
|---|---|---|
| nohurry/Opus-4.6-Reasoning-3000x-filtered | train (2.3 K) | 10 % |
| HuggingFaceTB/smol-smoltalk | train (460 K) | 45 % |
| OpenDataArena/ODA-Mixture-500k | train (500 K) | 45 % |
Reasoning samples are wrapped with <think>…</think> tags and upsampled 10× to compensate for the small dataset size.
Evaluation
Results from lm-evaluation-harness:
Comparison with Peer Models
< 10%entries are displayed as<10%in the chart.
| Benchmark | Arcade-3B | Gemma-2-2B | Llama-2-7B | Qwen1.5-1.8B | OpenLLaMA-v2-3B |
|---|---|---|---|---|---|
| MMLU | 52.9% | 52.4% | 45.3% | 46.8% | 41.0% |
| GSM8K | 62.9% | 50.9% | 14.6% | 37.8% | < 10% |
| HumanEval | 41.5% | 32.3% | 12.8% | 27.4% | < 10% |
| ARC-Challenge | 52.6% | 53.1% | 46.2% | 41.2% | 34.2% |
| ARC-Easy | 74.4% | 75.9% | 75.3% | 66.8% | 68.1% |
Arcade-3B Detailed Scores
| Benchmark | Few-shot | Metric | Score | ± |
|---|---|---|---|---|
| GSM8K | 5 | flexible-extract / exact_match | 0.6293 | 0.0133 |
| HumanEval | 0 | pass@1 | 0.4146 | 0.0386 |
| ARC-Challenge | 25 | acc_norm | 0.5256 | 0.0146 |
| ARC-Easy | 0 | acc | 0.7437 | 0.0090 |
| MMLU | 0 | acc | 0.5293 | 0.0040 |
Usage
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "NoesisLab/Arcade-3B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Solve step by step: If a train travels 120 km in 1.5 hours, what is its average speed?"}]
input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tok.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
For step-by-step reasoning, the model may emit a <think>…</think> block before the final answer.
Citation
@misc{noesislab2025arcade,
title = {ARCADE: State-Constraint Orthogonal Training},
author = {NoesisLab},
year = {2025},
howpublished = {\url{https://huggingface.co/NoesisLab/Arcade-3B}},
}
License
Apache 2.0 — inherited from SmolLM3-3B.
- Downloads last month
- -

