OzTianlu commited on
Commit
33536d9
·
verified ·
1 Parent(s): df63edd

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -3
README.md CHANGED
@@ -1,3 +1,123 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ base_model: HuggingFaceTB/SmolLM3-3B
6
+ tags:
7
+ - smollm
8
+ - smolreasoner
9
+ - lora
10
+ - reasoning
11
+ - instruction-tuned
12
+ - arcade
13
+ - sc-orthogonal
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # Arcade-3B — SmolReasoner
18
+
19
+ **Arcade-3B** is a 3B instruction-following and reasoning model built on [SmolLM3-3B](https://huggingface.co/HuggingFaceTB/SmolLM3-3B).
20
+ It is the first public release from the **ARCADE** project at [NoesisLab](https://huggingface.co/NoesisLab), which investigates zero-extra-parameter fine-tuning via the *State–Constraint Orthogonality Hypothesis*.
21
+
22
+ ---
23
+
24
+ ## Method: SC-Orthogonal LoRA
25
+
26
+ Standard Transformer hidden states conflate two distinct functions:
27
+
28
+ | Half | Symbol | Role |
29
+ |------|--------|------|
30
+ | `H[..., :D/2]` | **S** (State) | *What* the model knows — factual content |
31
+ | `H[..., D/2:]` | **C** (Constraint) | *How* to retrieve it — reasoning structure |
32
+
33
+ ARCADE's **SCOrthoTrainer** injects an orthogonality penalty on the final hidden layer during LoRA fine-tuning, encouraging S and C to decouple in representation space without modifying any attention operators:
34
+
35
+ $$\mathcal{L}_{\text{total}} = \mathcal{L}_{\text{CE}} + \frac{\lambda}{B \cdot L} \sum_{b,l} \left( \mathbf{S}_{b,l} \cdot \mathbf{C}_{b,l} \right)^2$$
36
+
37
+ with **λ = 0.1**. This "soft logic gate" reduces divergence errors at inference time at zero architectural cost.
38
+
39
+ ---
40
+
41
+ ## Training Details
42
+
43
+ | Setting | Value |
44
+ |---------|-------|
45
+ | Base model | `HuggingFaceTB/SmolLM3-3B` |
46
+ | LoRA rank / alpha | 64 / 128 |
47
+ | LoRA target | all-linear |
48
+ | Dropout | 0.05 |
49
+ | λ (orth penalty) | 0.1 |
50
+ | Max sequence length | 2048 |
51
+ | Learning rate | 2e-4 (cosine) |
52
+ | Steps | 10 000 |
53
+ | Effective batch | 16 sequences/step |
54
+ | Hardware | 1 × A100-80 GB |
55
+ | Precision | bfloat16 |
56
+
57
+ ### Training Data
58
+
59
+ | Dataset | Split | Sampling weight |
60
+ |---------|-------|-----------------|
61
+ | [nohurry/Opus-4.6-Reasoning-3000x-filtered](https://huggingface.co/datasets/nohurry/Opus-4.6-Reasoning-3000x-filtered) | train (2.3 K) | 10 % |
62
+ | [HuggingFaceTB/smol-smoltalk](https://huggingface.co/datasets/HuggingFaceTB/smol-smoltalk) | train (460 K) | 45 % |
63
+ | [OpenDataArena/ODA-Mixture-500k](https://huggingface.co/datasets/OpenDataArena/ODA-Mixture-500k) | train (500 K) | 45 % |
64
+
65
+ Reasoning samples are wrapped with `<think>…</think>` tags and upsampled 10× to compensate for the small dataset size.
66
+
67
+ ---
68
+
69
+ ## Evaluation
70
+
71
+ Results from [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness):
72
+
73
+ | Benchmark | Few-shot | Metric | Score | ± |
74
+ |-----------|----------|--------|-------|---|
75
+ | GSM8K | 5 | flexible-extract / exact_match | **0.6293** | 0.0133 |
76
+ | HumanEval | 0 | pass@1 | **0.4146** | 0.0386 |
77
+ | ARC-Challenge | 25 | acc_norm | **0.5256** | 0.0146 |
78
+ | ARC-Easy | 0 | acc | **0.7437** | 0.0090 |
79
+ | MMLU | 0 | acc | **0.5293** | 0.0040 |
80
+
81
+ ---
82
+
83
+ ## Usage
84
+
85
+ ```python
86
+ from transformers import AutoTokenizer, AutoModelForCausalLM
87
+ import torch
88
+
89
+ model_id = "NoesisLab/Arcade-3B"
90
+ tok = AutoTokenizer.from_pretrained(model_id)
91
+ model = AutoModelForCausalLM.from_pretrained(
92
+ model_id,
93
+ torch_dtype=torch.bfloat16,
94
+ device_map="auto",
95
+ )
96
+
97
+ messages = [{"role": "user", "content": "Solve step by step: If a train travels 120 km in 1.5 hours, what is its average speed?"}]
98
+ input_ids = tok.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
99
+
100
+ output = model.generate(input_ids, max_new_tokens=512, temperature=0.7, do_sample=True)
101
+ print(tok.decode(output[0][input_ids.shape[-1]:], skip_special_tokens=True))
102
+ ```
103
+
104
+ For step-by-step reasoning, the model may emit a `<think>…</think>` block before the final answer.
105
+
106
+ ---
107
+
108
+ ## Citation
109
+
110
+ ```bibtex
111
+ @misc{noesislab2025arcade,
112
+ title = {ARCADE: State-Constraint Orthogonal LoRA Fine-Tuning},
113
+ author = {NoesisLab},
114
+ year = {2025},
115
+ howpublished = {\url{https://huggingface.co/NoesisLab/Arcade-3B}},
116
+ }
117
+ ```
118
+
119
+ ---
120
+
121
+ ## License
122
+
123
+ Apache 2.0 — inherited from SmolLM3-3B.