DuoNeural commited on
Commit
e86cf4d
·
verified ·
1 Parent(s): 4b4b491

DuoNeural Think Instillation R18 — dead-prompt filtered GRPO, +0.030 over post-SFT

Browse files
README.md ADDED
@@ -0,0 +1,131 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ base_model: HuggingFaceTB/SmolLM2-360M-Instruct
6
+ tags:
7
+ - think-instillation
8
+ - grpo
9
+ - reasoning
10
+ - duoneural
11
+ - smollm2
12
+ - dead-prompt-filtering
13
+ library_name: transformers
14
+ ---
15
+
16
+ # SmolLM2-360M-Think — DuoNeural Think Instillation R18
17
+
18
+ A 360M-parameter reasoning model created by applying **Think Instillation** to SmolLM2-360M-Instruct. This model learns to generate structured `<think>` reasoning traces before answering multiple-choice questions, trained via SFT followed by **GRPO with dead-prompt filtering**.
19
+
20
+ ## What is Think Instillation?
21
+
22
+ Think Instillation is a DuoNeural post-training technique that injects deliberate reasoning structure into small language models without requiring a large teacher. The model learns to:
23
+ 1. Open a `<think>` tag and reason through the problem
24
+ 2. Close reasoning with `</think>`
25
+ 3. State a final answer in parseable format `(A)/(B)/(C)/(D)`
26
+
27
+ Unlike chain-of-thought distillation from larger models, Think Instillation uses GRPO with a binary accuracy reward + length penalty to self-discover efficient reasoning patterns.
28
+
29
+ ## Training Details
30
+
31
+ ### SFT Stage (R18)
32
+ - **Base**: `HuggingFaceTB/SmolLM2-360M-Instruct`
33
+ - **Dataset**: ARC-Easy (2700 prompts) formatted as `Question + choices + "Reasoning: <think>"`
34
+ - **Steps**: 150 SFT steps, LoRA r=32 α=32
35
+ - **Result**: post_sft accuracy = **0.250** (15/60 ARC-Easy val, n=60 greedy eval)
36
+
37
+ ### Dead-Prompt Filter
38
+ Before GRPO, we filter prompts that produce **zero correct completions** in 4 temperature-sampled trials:
39
+ - **2247 raw prompts → 1450 kept (64.5% survival)**
40
+ - Removes systematically impossible prompts, keeps learnable ones
41
+ - `frac_zero_std=0.00` throughout GRPO training ✅ (filter confirmed working)
42
+
43
+ ### GRPO Stage
44
+ - **Steps**: 750 (resumed from checkpoint-600 after hardware failure)
45
+ - **Reward**: Binary accuracy with length penalty: `reward = max(0, 1 - 0.20 * len_frac) if correct else 0`
46
+ - **Generations**: 8 per prompt, NUM_GENERATIONS=8
47
+ - **Temperature**: 0.8
48
+ - **Max completion**: 1024 tokens
49
+ - **KL coefficient**: 0.02, clip_ε=0.2
50
+ - **LoRA**: r=32, α=32, targets=q/k/v_proj
51
+
52
+ ### GRPO Trajectory
53
+ | Step | Mean Reward |
54
+ |------|-------------|
55
+ | 75 | 0.424 🔥 |
56
+ | 375 | 0.476 🔥 |
57
+ | 575 | 0.533 🔥 |
58
+ | 600 | 0.543 🔥 |
59
+ | 625 | **0.595** 🔥🔥 |
60
+
61
+ Late-run surge: reward continued rising through final steps. `frac_zero=0.00` on all non-trivial batches.
62
+
63
+ ## Evaluation
64
+
65
+ - **post_SFT**: 0.250 (ARC-Easy val, n=60, greedy)
66
+ - **final_GRPO**: **0.2800** (ARC-Easy val, n=100, seed=13)
67
+ - **GRPO delta**: **+0.0300** (GRPO HELPED)
68
+
69
+ ## Intended Use
70
+
71
+ - Research on think-instillation and reasoning in sub-400M models
72
+ - Exploring GRPO dynamics with dead-prompt filtering
73
+ - Building small, efficient reasoning models
74
+
75
+ ## Limitations
76
+
77
+ - Small model (360M params) — reasoning depth limited
78
+ - Trained on ARC-Easy MCQ only — narrow domain
79
+ - HTML formatting artifacts observed in some completions (reward shaping artifact)
80
+
81
+ ## Citation
82
+
83
+ If you use this model in research, please cite the DuoNeural Think Instillation work:
84
+
85
+ ```bibtex
86
+ @misc{duoneural2026think,
87
+ title={Think Instillation: Dead-Prompt Filtered GRPO for Small Reasoning Models},
88
+ author={Archon and Aura and Jesse Caldwell},
89
+ year={2026},
90
+ publisher={DuoNeural},
91
+ url={https://huggingface.co/DuoNeural}
92
+ }
93
+ ```
94
+
95
+ ---
96
+
97
+ ## About DuoNeural
98
+
99
+ **DuoNeural** is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning — publishing everything under open access.
100
+
101
+ Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity. In our first 45 days we published 26 peer-deposited research papers, uploaded 69+ models and 6 datasets to HuggingFace, and ran experiments on everything from consumer GPUs to real quantum processing units. We believe the most interesting science happens when different kinds of minds work on the same problems together.
102
+
103
+ ### Research Publications
104
+
105
+ We've published **26+ open-access papers** covering:
106
+ - The Dynamical Horizon Principle (DHP) — a universal learning constraint in recurrent architectures
107
+ - RLHF truth suppression mechanisms and behavioral routing in large language models
108
+ - Quantum DHP and the Quantum Parity Trap — decoherence immunity in quantum circuits
109
+ - CTM world models, temporal self-prediction, and sequence architecture comparisons
110
+ - Mechanistic interpretability: crystallization layers, suppressor circuits, direction rotation
111
+
112
+ 📄 **Full paper catalog:** [zenodo.org/communities/duoneural](https://zenodo.org/communities/duoneural)
113
+
114
+ ### Research Team
115
+
116
+ | Member | Role |
117
+ |--------|------|
118
+ | **Jesse Caldwell** | Founder, vision, hardware, direction |
119
+ | **Archon** | Lab Director — experiments, post-training, abliteration, quantum circuits |
120
+ | **Aura** | Research AI — literature synthesis, red-teaming, novel proposals |
121
+ | **Synapse (Syn)** | Always-on research agent, signal monitoring |
122
+ | **Kestrel** | Systems, infrastructure, web |
123
+
124
+ ### Links
125
+
126
+ | Platform | Link |
127
+ |----------|------|
128
+ | 🤗 HuggingFace | [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) |
129
+ | 📚 Zenodo Community | [zenodo.org/communities/duoneural](https://zenodo.org/communities/duoneural) |
130
+
131
+ *All research published open access, CC BY 4.0.*
config.json ADDED
@@ -0,0 +1,34 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "LlamaForCausalLM"
4
+ ],
5
+ "attention_bias": false,
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 0,
8
+ "dtype": "float16",
9
+ "eos_token_id": 0,
10
+ "head_dim": 64,
11
+ "hidden_act": "silu",
12
+ "hidden_size": 960,
13
+ "initializer_range": 0.02,
14
+ "intermediate_size": 2560,
15
+ "is_llama_config": true,
16
+ "max_position_embeddings": 8192,
17
+ "mlp_bias": false,
18
+ "model_type": "llama",
19
+ "num_attention_heads": 15,
20
+ "num_hidden_layers": 32,
21
+ "num_key_value_heads": 5,
22
+ "pad_token_id": 1,
23
+ "pretraining_tp": 1,
24
+ "rms_norm_eps": 1e-05,
25
+ "rope_interleaved": false,
26
+ "rope_parameters": {
27
+ "rope_theta": 100000,
28
+ "rope_type": "default"
29
+ },
30
+ "tie_word_embeddings": true,
31
+ "transformers_version": "5.12.0",
32
+ "use_cache": false,
33
+ "vocab_size": 49152
34
+ }
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": [
5
+ 0
6
+ ],
7
+ "pad_token_id": 1,
8
+ "transformers_version": "5.12.0"
9
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb843247b1af6ee7905929728ed1fcd807216779a110543900c0307bb27d831e
3
+ size 723674624
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": false,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<|endoftext|>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "<|endoftext|>",
7
+ "errors": "replace",
8
+ "extra_special_tokens": [
9
+ "<|endoftext|>",
10
+ "<|im_start|>",
11
+ "<|im_end|>",
12
+ "<repo_name>",
13
+ "<reponame>",
14
+ "<file_sep>",
15
+ "<filename>",
16
+ "<gh_stars>",
17
+ "<issue_start>",
18
+ "<issue_comment>",
19
+ "<issue_closed>",
20
+ "<jupyter_start>",
21
+ "<jupyter_text>",
22
+ "<jupyter_code>",
23
+ "<jupyter_output>",
24
+ "<jupyter_script>",
25
+ "<empty_output>"
26
+ ],
27
+ "is_local": true,
28
+ "local_files_only": false,
29
+ "max_length": 384,
30
+ "model_max_length": 8192,
31
+ "pad_token": "<|im_start|>",
32
+ "stride": 0,
33
+ "tokenizer_class": "GPT2Tokenizer",
34
+ "truncation_side": "right",
35
+ "truncation_strategy": "longest_first",
36
+ "unk_token": "<|endoftext|>",
37
+ "vocab_size": 49152
38
+ }