--- language: - en license: apache-2.0 base_model: HuggingFaceTB/SmolLM3-3B tags: - smollm - smolreasoner - reasoning - instruction-tuned - arcade - sc-orthogonal pipeline_tag: text-generation --- # Arcade-3B — SmolReasoner [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19029063.svg)](https://doi.org/10.5281/zenodo.19029063)[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Base Model](https://img.shields.io/badge/Base-SmolLM3--3B-orange)](https://huggingface.co/HuggingFaceTB/SmolLM3-3B) [![NoesisLab](https://img.shields.io/badge/Lab-NoesisLab-purple)](https://huggingface.co/NoesisLab) [![GSM8K](https://img.shields.io/badge/GSM8K-62.9%25-brightgreen)](https://huggingface.co/NoesisLab/Arcade-3B) [![ARC-Easy](https://img.shields.io/badge/ARC--Easy-74.4%25-brightgreen)](https://huggingface.co/NoesisLab/Arcade-3B) **Arcade-3B** is a 3B instruction-following and reasoning model built on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B). It is the first public release from the **ARCADE** project at [NoesisLab](https://huggingface.co/NoesisLab), which investigates the *State–Constraint Orthogonality Hypothesis*: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization. --- ## Method: SC-Orthogonal Training Standard Transformer hidden states conflate two distinct functions: | Half | Symbol | Role | |------|--------|------| | `H[..., :D/2]` | **S** (State) | *What* the model knows — factual content | | `H[..., D/2:]` | **C** (Constraint) | *How* to retrieve it — reasoning structure | ARCADE's **SCOrthoTrainer** injects an orthogonality penalty on the final hidden layer, encouraging S and C to decouple in representation space without modifying any attention operators: $$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \frac{\lambda}{B \cdot L} \sum_{b,l} \left( \mathbf{S}_{b,l} \cdot \mathbf{C}_{b,l} \right)^2$$ with **λ = 0.1**. This soft regularization reduces divergence errors at inference time at zero architectural cost. ![SC-Orthogonal Optimization Loop](dia.jpg) --- ## Training Details | Setting | Value | |---------|-------| | Base model | `HuggingFaceTB/SmolLM3-3B` | | λ (orth penalty) | 0.1 | | Max sequence length | 2048 | | Learning rate | 2e-4 (cosine) | | Steps | 10 000 | | Effective batch | 16 sequences/step | | Hardware | 1 × A100-80 GB | | Precision | bfloat16 | ### Training Data | Dataset | Split | Sampling weight | |---------|-------|-----------------| | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | train (2.3 K) | 10 % | | [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) | train (460 K) | 45 % | | [OpenDataArena/ODA-Mixture-500k](https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-500k) | train (500 K) | 45 % | Reasoning samples are wrapped with `` tags and upsampled 10× to compensate for the small dataset size. --- ## Evaluation Results from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness): ### Comparison with Peer Models ![Benchmark Comparison](benchmark_comparison.png) > `< 10%` entries are displayed as `<10%` in the chart. | Benchmark | Arcade-3B | Gemma-2-2B | Llama-2-7B | Qwen1.5-1.8B | OpenLLaMA-v2-3B | |-----------|-----------|------------|------------|--------------|-----------------| | MMLU | **52.9%** | 52.4% | 45.3% | 46.8% | 41.0% | | GSM8K | **62.9%** | 50.9% | 14.6% | 37.8% | < 10% | | HumanEval | **41.5%** | 32.3% | 12.8% | 27.4% | < 10% | | ARC-Challenge | 52.6% | **53.1%** | 46.2% | 41.2% | 34.2% | | ARC-Easy | 74.4% | **75.9%** | 75.3% | 66.8% | 68.1% | ### Arcade-3B Detailed Scores | Benchmark | Few-shot | Metric | Score | ± | |-----------|----------|--------|-------|---| | GSM8K | 5 | flexible-extract / exact_match | **0.6293** | 0.0133 | | HumanEval | 0 | pass@1 | **0.4146** | 0.0386 | | ARC-Challenge | 25 | acc_norm | **0.5256** | 0.0146 | | ARC-Easy | 0 | acc | **0.7437** | 0.0090 | | MMLU | 0 | acc | **0.5293** | 0.0040 | --- ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch model_id = "NoesisLab/Arcade-3B" tok = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", ) messages = [{"role": "user", "content": "Solve step by step: If a train travels 120 km in 1.5 hours, what is its average speed?"}] input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device) output = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True) print(tok.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True)) ``` For step-by-step reasoning, the model may emit a `` block before the final answer. --- ## Citation ```bibtex @misc{noesislab2025arcade, title = {ARCADE: State-Constraint Orthogonal Training}, author = {NoesisLab}, year = {2025}, howpublished = {\url{https://huggingface.co/NoesisLab/Arcade-3B}}, } ``` --- ## License Apache 2.0 — inherited from SmolLM3-3B.