Update README.md

af0aa14 verified about 18 hours ago

5.36 kB

	---
	language:
	- en
	license: apache-2.0
	base_model: HuggingFaceTB/SmolLM3-3B
	tags:
	- smollm
	- smolreasoner
	- reasoning
	- instruction-tuned
	- arcade
	- sc-orthogonal
	pipeline_tag: text-generation
	---

	# Arcade-3B — SmolReasoner

	[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.19029063.svg)](https://doi.org/10.5281/zenodo.19029063)[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0)
	[![Base Model](https://img.shields.io/badge/Base-SmolLM3--3B-orange)](https://huggingface.co/HuggingFaceTB/SmolLM3-3B)
	[![NoesisLab](https://img.shields.io/badge/Lab-NoesisLab-purple)](https://huggingface.co/NoesisLab)
	[![GSM8K](https://img.shields.io/badge/GSM8K-62.9%25-brightgreen)](https://huggingface.co/NoesisLab/Arcade-3B)
	[![ARC-Easy](https://img.shields.io/badge/ARC--Easy-74.4%25-brightgreen)](https://huggingface.co/NoesisLab/Arcade-3B)

	Arcade-3B is a 3B instruction-following and reasoning model built on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B).
	It is the first public release from the ARCADE project at [NoesisLab](https://huggingface.co/NoesisLab), which investigates the State–Constraint Orthogonality Hypothesis: standard Transformer hidden states conflate factual content and reasoning structure in the same subspace, and explicitly decoupling them improves generalization.

	---

	## Method: SC-Orthogonal Training

	Standard Transformer hidden states conflate two distinct functions:

	\| Half \| Symbol \| Role \|
	\|------\|--------\|------\|
	\| `H[..., :D/2]` \| S (State) \| What the model knows — factual content \|
	\| `H[..., D/2:]` \| C (Constraint) \| How to retrieve it — reasoning structure \|

	ARCADE's SCOrthoTrainer injects an orthogonality penalty on the final hidden layer, encouraging S and C to decouple in representation space without modifying any attention operators:

	$$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \frac{\lambda}{B \cdot L} \sum_{b,l} \left( \mathbf{S}_{b,l} \cdot \mathbf{C}_{b,l} \right)^2$$

	with λ = 0.1. This soft regularization reduces divergence errors at inference time at zero architectural cost.

	![SC-Orthogonal Optimization Loop](dia.jpg)

	---

	## Training Details

	\| Setting \| Value \|
	\|---------\|-------\|
	\| Base model \| `HuggingFaceTB/SmolLM3-3B` \|
	\| λ (orth penalty) \| 0.1 \|
	\| Max sequence length \| 2048 \|
	\| Learning rate \| 2e-4 (cosine) \|
	\| Steps \| 10 000 \|
	\| Effective batch \| 16 sequences/step \|
	\| Hardware \| 1 × A100-80 GB \|
	\| Precision \| bfloat16 \|

	### Training Data

	\| Dataset \| Split \| Sampling weight \|
	\|---------\|-------\|-----------------\|
	\| [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) \| train (2.3 K) \| 10 % \|
	\| [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) \| train (460 K) \| 45 % \|
	\| [OpenDataArena/ODA-Mixture-500k](https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-500k) \| train (500 K) \| 45 % \|

	Reasoning samples are wrapped with `<think>…</think>` tags and upsampled 10× to compensate for the small dataset size.

	---

	## Evaluation

	Results from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):

	### Comparison with Peer Models

	![Benchmark Comparison](benchmark_comparison.png)

	> `< 10%` entries are displayed as `<10%` in the chart.

	\| Benchmark \| Arcade-3B \| Gemma-2-2B \| Llama-2-7B \| Qwen1.5-1.8B \| OpenLLaMA-v2-3B \|
	\|-----------\|-----------\|------------\|------------\|--------------\|-----------------\|
	\| MMLU \| 52.9% \| 52.4% \| 45.3% \| 46.8% \| 41.0% \|
	\| GSM8K \| 62.9% \| 50.9% \| 14.6% \| 37.8% \| < 10% \|
	\| HumanEval \| 41.5% \| 32.3% \| 12.8% \| 27.4% \| < 10% \|
	\| ARC-Challenge \| 52.6% \| 53.1% \| 46.2% \| 41.2% \| 34.2% \|
	\| ARC-Easy \| 74.4% \| 75.9% \| 75.3% \| 66.8% \| 68.1% \|

	### Arcade-3B Detailed Scores

	\| Benchmark \| Few-shot \| Metric \| Score \| ± \|
	\|-----------\|----------\|--------\|-------\|---\|
	\| GSM8K \| 5 \| flexible-extract / exact_match \| 0.6293 \| 0.0133 \|
	\| HumanEval \| 0 \| pass@1 \| 0.4146 \| 0.0386 \|
	\| ARC-Challenge \| 25 \| acc_norm \| 0.5256 \| 0.0146 \|
	\| ARC-Easy \| 0 \| acc \| 0.7437 \| 0.0090 \|
	\| MMLU \| 0 \| acc \| 0.5293 \| 0.0040 \|

	---

	## Usage

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model_id = "NoesisLab/Arcade-3B"
	tok = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)

	messages = [{"role": "user", "content": "Solve step by step: If a train travels 120 km in 1.5 hours, what is its average speed?"}]
	input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

	output = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True)
	print(tok.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
	```

	For step-by-step reasoning, the model may emit a `<think>…</think>` block before the final answer.

	---

	## Citation

	```bibtex
	@misc{noesislab2025arcade,
	title = {ARCADE: State-Constraint Orthogonal Training},
	author = {NoesisLab},
	year = {2025},
	howpublished = {\url{https://huggingface.co/NoesisLab/Arcade-3B}},
	}
	```

	---

	## License

	Apache 2.0 — inherited from SmolLM3-3B.