| | --- |
| | language: |
| | - en |
| | license: apache-2.0 |
| | base_model: HuggingFaceTB/SmolLM3-3B |
| | tags: |
| | - smollm |
| | - smolreasoner |
| | - reasoning |
| | - instruction-tuned |
| | - arcade |
| | - sc-orthogonal |
| | pipeline_tag: text-generation |
| | --- |
| | |
| | # Arcade-3B — SmolReasoner |
| |
|
| | [](https://doi.org/10.5281/zenodo.19029063)[](https://opensource.org/licenses/Apache-2.0) |
| | [](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) |
| | [](https://huggingface.co/NoesisLab) |
| | [](https://huggingface.co/NoesisLab/Arcade-3B) |
| | [](https://huggingface.co/NoesisLab/Arcade-3B) |
| |
|
| | **Arcade-3B** is a 3B instruction-following and reasoning model built on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B). |
| | It is the first public release from the **ARCADE** project at [NoesisLab](https://huggingface.co/NoesisLab), which investigates the *State–Constraint Orthogonality Hypothesis*: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization. |
| |
|
| | --- |
| |
|
| | ## Method: SC-Orthogonal Training |
| |
|
| | Standard Transformer hidden states conflate two distinct functions: |
| |
|
| | | Half | Symbol | Role | |
| | |------|--------|------| |
| | | `H[..., :D/2]` | **S** (State) | *What* the model knows — factual content | |
| | | `H[..., D/2:]` | **C** (Constraint) | *How* to retrieve it — reasoning structure | |
| |
|
| | ARCADE's **SCOrthoTrainer** injects an orthogonality penalty on the final hidden layer, encouraging S and C to decouple in representation space without modifying any attention operators: |
| |
|
| | $$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \frac{\lambda}{B \cdot L} \sum_{b,l} \left( \mathbf{S}_{b,l} \cdot \mathbf{C}_{b,l} \right)^2$$ |
| | |
| | with **λ = 0.1**. This soft regularization reduces divergence errors at inference time at zero architectural cost. |
| | |
| |  |
| | |
| | --- |
| | |
| | ## Training Details |
| | |
| | | Setting | Value | |
| | |---------|-------| |
| | | Base model | `HuggingFaceTB/SmolLM3-3B` | |
| | | λ (orth penalty) | 0.1 | |
| | | Max sequence length | 2048 | |
| | | Learning rate | 2e-4 (cosine) | |
| | | Steps | 10 000 | |
| | | Effective batch | 16 sequences/step | |
| | | Hardware | 1 × A100-80 GB | |
| | | Precision | bfloat16 | |
| | |
| | ### Training Data |
| | |
| | | Dataset | Split | Sampling weight | |
| | |---------|-------|-----------------| |
| | | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | train (2.3 K) | 10 % | |
| | | [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) | train (460 K) | 45 % | |
| | | [OpenDataArena/ODA-Mixture-500k](https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-500k) | train (500 K) | 45 % | |
| | |
| | Reasoning samples are wrapped with `<think>…</think>` tags and upsampled 10× to compensate for the small dataset size. |
| | |
| | --- |
| | |
| | ## Evaluation |
| | |
| | Results from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness): |
| | |
| | ### Comparison with Peer Models |
| | |
| |  |
| | |
| | > `< 10%` entries are displayed as `<10%` in the chart. |
| | |
| | | Benchmark | Arcade-3B | Gemma-2-2B | Llama-2-7B | Qwen1.5-1.8B | OpenLLaMA-v2-3B | |
| | |-----------|-----------|------------|------------|--------------|-----------------| |
| | | MMLU | **52.9%** | 52.4% | 45.3% | 46.8% | 41.0% | |
| | | GSM8K | **62.9%** | 50.9% | 14.6% | 37.8% | < 10% | |
| | | HumanEval | **41.5%** | 32.3% | 12.8% | 27.4% | < 10% | |
| | | ARC-Challenge | 52.6% | **53.1%** | 46.2% | 41.2% | 34.2% | |
| | | ARC-Easy | 74.4% | **75.9%** | 75.3% | 66.8% | 68.1% | |
| | |
| | ### Arcade-3B Detailed Scores |
| | |
| | | Benchmark | Few-shot | Metric | Score | ± | |
| | |-----------|----------|--------|-------|---| |
| | | GSM8K | 5 | flexible-extract / exact_match | **0.6293** | 0.0133 | |
| | | HumanEval | 0 | pass@1 | **0.4146** | 0.0386 | |
| | | ARC-Challenge | 25 | acc_norm | **0.5256** | 0.0146 | |
| | | ARC-Easy | 0 | acc | **0.7437** | 0.0090 | |
| | | MMLU | 0 | acc | **0.5293** | 0.0040 | |
| | |
| | --- |
| | |
| | ## Usage |
| | |
| | ```python |
| | from transformers import AutoTokenizer, AutoModelForCausalLM |
| | import torch |
| | |
| | model_id = "NoesisLab/Arcade-3B" |
| | tok = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto", |
| | ) |
| | |
| | messages = [{"role": "user", "content": "Solve step by step: If a train travels 120 km in 1.5 hours, what is its average speed?"}] |
| | input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) |
| |
|
| | output = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True) |
| | print(tok.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)) |
| | ``` |
| | |
| | For step-by-step reasoning, the model may emit a `<think>…</think>` block before the final answer. |
| | |
| | --- |
| | |
| | ## Citation |
| | |
| | ```bibtex |
| | @misc{noesislab2025arcade, |
| | title = {ARCADE: State-Constraint Orthogonal Training}, |
| | author = {NoesisLab}, |
| | year = {2025}, |
| | howpublished = {\url{https://huggingface.co/NoesisLab/Arcade-3B}}, |
| | } |
| | ``` |
| | |
| | --- |
| | |
| | ## License |
| | |
| | Apache 2.0 — inherited from SmolLM3-3B. |
| | |