File size: 5,364 Bytes

---
language:
- en
license: apache-2.0
base_model: HuggingFaceTB/SmolLM3-3B
tags:
- smollm
- smolreasoner
- reasoning
- instruction-tuned
- arcade
- sc-orthogonal
pipeline_tag: text-generation
---

# Arcade-3B — SmolReasoner

[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19029063.svg)](https://doi.org/10.5281/zenodo.19029063)[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
[![Base Model](https://img.shields.io/badge/Base-SmolLM3--3B-orange)](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
[![NoesisLab](https://img.shields.io/badge/Lab-NoesisLab-purple)](https://huggingface.co/NoesisLab)
[![GSM8K](https://img.shields.io/badge/GSM8K-62.9%25-brightgreen)](https://huggingface.co/NoesisLab/Arcade-3B)
[![ARC-Easy](https://img.shields.io/badge/ARC--Easy-74.4%25-brightgreen)](https://huggingface.co/NoesisLab/Arcade-3B)

**Arcade-3B** is a 3B instruction-following and reasoning model built on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B).
It is the first public release from the **ARCADE** project at [NoesisLab](https://huggingface.co/NoesisLab), which investigates the *State–Constraint Orthogonality Hypothesis*: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization.

---

## Method: SC-Orthogonal Training

Standard Transformer hidden states conflate two distinct functions:

| Half | Symbol | Role |
|------|--------|------|
| `H[..., :D/2]` | **S** (State) | *What* the model knows — factual content |
| `H[..., D/2:]` | **C** (Constraint) | *How* to retrieve it — reasoning structure |

ARCADE's **SCOrthoTrainer** injects an orthogonality penalty on the final hidden layer, encouraging S and C to decouple in representation space without modifying any attention operators:

$$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \frac{\lambda}{B \cdot L} \sum_{b,l} \left( \mathbf{S}_{b,l} \cdot \mathbf{C}_{b,l} \right)^2$$

with **λ = 0.1**. This soft regularization reduces divergence errors at inference time at zero architectural cost.

![SC-Orthogonal Optimization Loop](dia.jpg)

---

## Training Details

| Setting | Value |
|---------|-------|
| Base model | `HuggingFaceTB/SmolLM3-3B` |
| λ (orth penalty) | 0.1 |
| Max sequence length | 2048 |
| Learning rate | 2e-4 (cosine) |
| Steps | 10 000 |
| Effective batch | 16 sequences/step |
| Hardware | 1 × A100-80 GB |
| Precision | bfloat16 |

### Training Data

| Dataset | Split | Sampling weight |
|---------|-------|-----------------|
| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | train (2.3 K) | 10 % |
| [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) | train (460 K) | 45 % |
| [OpenDataArena/ODA-Mixture-500k](https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-500k) | train (500 K) | 45 % |

Reasoning samples are wrapped with `<think>…</think>` tags and upsampled 10× to compensate for the small dataset size.

---

## Evaluation

Results from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):

### Comparison with Peer Models

![Benchmark Comparison](benchmark_comparison.png)

> `< 10%` entries are displayed as `<10%` in the chart.

| Benchmark | Arcade-3B | Gemma-2-2B | Llama-2-7B | Qwen1.5-1.8B | OpenLLaMA-v2-3B |
|-----------|-----------|------------|------------|--------------|-----------------|
| MMLU | **52.9%** | 52.4% | 45.3% | 46.8% | 41.0% |
| GSM8K | **62.9%** | 50.9% | 14.6% | 37.8% | < 10% |
| HumanEval | **41.5%** | 32.3% | 12.8% | 27.4% | < 10% |
| ARC-Challenge | 52.6% | **53.1%** | 46.2% | 41.2% | 34.2% |
| ARC-Easy | 74.4% | **75.9%** | 75.3% | 66.8% | 68.1% |

### Arcade-3B Detailed Scores

| Benchmark | Few-shot | Metric | Score | ± |
|-----------|----------|--------|-------|---|
| GSM8K | 5 | flexible-extract / exact_match | **0.6293** | 0.0133 |
| HumanEval | 0 | pass@1 | **0.4146** | 0.0386 |
| ARC-Challenge | 25 | acc_norm | **0.5256** | 0.0146 |
| ARC-Easy | 0 | acc | **0.7437** | 0.0090 |
| MMLU | 0 | acc | **0.5293** | 0.0040 |

---

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_id = "NoesisLab/Arcade-3B"
tok = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role": "user", "content": "Solve step by step: If a train travels 120 km in 1.5 hours, what is its average speed?"}]
input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

output = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True)
print(tok.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
```

For step-by-step reasoning, the model may emit a `<think>…</think>` block before the final answer.

---

## Citation

```bibtex
@misc{noesislab2025arcade,
  title        = {ARCADE: State-Constraint Orthogonal Training},
  author       = {NoesisLab},
  year         = {2025},
  howpublished = {\url{https://huggingface.co/NoesisLab/Arcade-3B}},
}
```

---

## License

Apache 2.0 — inherited from SmolLM3-3B.