File size: 5,364 Bytes
33536d9 af0aa14 b110a54 33536d9 b110a54 33536d9 b110a54 33536d9 b110a54 33536d9 b110a54 33536d9 6e4b00a 6449679 33536d9 17e6d5c 33536d9 b110a54 33536d9 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 | ---
language:
- en
license: apache-2.0
base_model: HuggingFaceTB/SmolLM3-3B
tags:
- smollm
- smolreasoner
- reasoning
- instruction-tuned
- arcade
- sc-orthogonal
pipeline_tag: text-generation
---
# Arcade-3B — SmolReasoner
[](https://doi.org/10.5281/zenodo.19029063)[](https://opensource.org/licenses/Apache-2.0)
[](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
[](https://huggingface.co/NoesisLab)
[](https://huggingface.co/NoesisLab/Arcade-3B)
[](https://huggingface.co/NoesisLab/Arcade-3B)
**Arcade-3B** is a 3B instruction-following and reasoning model built on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B).
It is the first public release from the **ARCADE** project at [NoesisLab](https://huggingface.co/NoesisLab), which investigates the *State–Constraint Orthogonality Hypothesis*: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization.
---
## Method: SC-Orthogonal Training
Standard Transformer hidden states conflate two distinct functions:
| Half | Symbol | Role |
|------|--------|------|
| `H[..., :D/2]` | **S** (State) | *What* the model knows — factual content |
| `H[..., D/2:]` | **C** (Constraint) | *How* to retrieve it — reasoning structure |
ARCADE's **SCOrthoTrainer** injects an orthogonality penalty on the final hidden layer, encouraging S and C to decouple in representation space without modifying any attention operators:
$$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \frac{\lambda}{B \cdot L} \sum_{b,l} \left( \mathbf{S}_{b,l} \cdot \mathbf{C}_{b,l} \right)^2$$
with **λ = 0.1**. This soft regularization reduces divergence errors at inference time at zero architectural cost.

---
## Training Details
| Setting | Value |
|---------|-------|
| Base model | `HuggingFaceTB/SmolLM3-3B` |
| λ (orth penalty) | 0.1 |
| Max sequence length | 2048 |
| Learning rate | 2e-4 (cosine) |
| Steps | 10 000 |
| Effective batch | 16 sequences/step |
| Hardware | 1 × A100-80 GB |
| Precision | bfloat16 |
### Training Data
| Dataset | Split | Sampling weight |
|---------|-------|-----------------|
| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | train (2.3 K) | 10 % |
| [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) | train (460 K) | 45 % |
| [OpenDataArena/ODA-Mixture-500k](https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-500k) | train (500 K) | 45 % |
Reasoning samples are wrapped with `<think>…</think>` tags and upsampled 10× to compensate for the small dataset size.
---
## Evaluation
Results from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):
### Comparison with Peer Models

> `< 10%` entries are displayed as `<10%` in the chart.
| Benchmark | Arcade-3B | Gemma-2-2B | Llama-2-7B | Qwen1.5-1.8B | OpenLLaMA-v2-3B |
|-----------|-----------|------------|------------|--------------|-----------------|
| MMLU | **52.9%** | 52.4% | 45.3% | 46.8% | 41.0% |
| GSM8K | **62.9%** | 50.9% | 14.6% | 37.8% | < 10% |
| HumanEval | **41.5%** | 32.3% | 12.8% | 27.4% | < 10% |
| ARC-Challenge | 52.6% | **53.1%** | 46.2% | 41.2% | 34.2% |
| ARC-Easy | 74.4% | **75.9%** | 75.3% | 66.8% | 68.1% |
### Arcade-3B Detailed Scores
| Benchmark | Few-shot | Metric | Score | ± |
|-----------|----------|--------|-------|---|
| GSM8K | 5 | flexible-extract / exact_match | **0.6293** | 0.0133 |
| HumanEval | 0 | pass@1 | **0.4146** | 0.0386 |
| ARC-Challenge | 25 | acc_norm | **0.5256** | 0.0146 |
| ARC-Easy | 0 | acc | **0.7437** | 0.0090 |
| MMLU | 0 | acc | **0.5293** | 0.0040 |
---
## Usage
```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "NoesisLab/Arcade-3B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role": "user", "content": "Solve step by step: If a train travels 120 km in 1.5 hours, what is its average speed?"}]
input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tok.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
```
For step-by-step reasoning, the model may emit a `<think>…</think>` block before the final answer.
---
## Citation
```bibtex
@misc{noesislab2025arcade,
title = {ARCADE: State-Constraint Orthogonal Training},
author = {NoesisLab},
year = {2025},
howpublished = {\url{https://huggingface.co/NoesisLab/Arcade-3B}},
}
```
---
## License
Apache 2.0 — inherited from SmolLM3-3B.
|