File size: 7,188 Bytes
7c00132
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
license: apache-2.0
language:
- en
- code
tags:
- language-model
- novel-architecture
- code-generation
- mechanistic-interpretability
- scope-registers
- depth-stratified-routing
- python
- duoneural
pipeline_tag: text-generation
---

# CDM-Code-37M

**Competitive Docking Memory β€” Code Model** β€” 37M parameter language model trained on 200M tokens of Python code (codeparrot/codeparrot-clean-train).

This model spontaneously develops **hierarchical scope registers** β€” depth-stratified memory routing that mirrors Python's syntactic nesting structure β€” without any structural supervision. The model was trained only on next-token prediction.

---

## Key Finding: Emergent Scope Registers

Without any explicit supervision about code structure, AST depth, or indentation, CDM develops routing behavior that mirrors syntactic nesting:

| Nesting depth | Routed to slots |
|---------------|----------------|
| Depth 0 (class/module declarations) | Slots 8, 15, 13 |
| Depth 1 (method definitions) | Distributed |
| Depth 2+ (deep nested code) | Slots 7, 3, 5 |

**MI(slot assignment; indentation depth) = 0.1467** at training completion (step 30k). This is confirmed by two independent methods:
1. JSON probe: full routing distributions across full dataset β†’ MI_ratio = 0.1467
2. Gate-routing probe: single-sample argmax β†’ MI = 0.281 bits

The scope register effect emerges because Python's next-token distribution is depth-dependent: `return` after a method body has very different continuations than `return` inside a nested loop. CDM learns to allocate distinct memory slots to different nesting contexts as a side effect of minimizing next-token prediction loss.

---

## Training Results

| Model | Val CE | Dataset | Notes |
|-------|--------|---------|-------|
| **CDM Code (this model)** | **1.3483** | codeparrot 200M tok | Best at step 28.5k/29k |
| CDM V3 (TinyStories) | 1.5831 | TinyStories | Same architecture, different domain |

Training: 30k steps, seq_len=256, batch=16, AdamW lr=3e-4. Architecture: CDMLanguageModelV2 (input-dependent Ξ· network for alpha, not global log_alpha).

Val CE trajectory: `1.51(15k) β†’ 1.44(18k) β†’ 1.40(21k) β†’ 1.36(26k) β†’ 1.3483(28.5k)` β†’ `1.3487(30k)`

---

## Syntactic Role Taxonomy

The gate-routing probe reveals consistent syntactic specialization:

| Slot | Role | Trigger tokens | Peak gate |
|------|------|----------------|-----------|
| s3 | STRUCTURAL DECLARATOR | `def`, `class` | 0.100–0.128 |
| s4 | FLOW CONTROL | `return`, `if` | 0.062–0.082 |
| s6 | CALL DELIMITER | `(`, `)`, `():` | 0.29 |
| s12 | BLOCK OPENER | `:` (colon) | 0.041 |
| s13 | IDENTIFIER | variable/function names | 0.031 |
| s14 | ITERATION/CONDITION | `for`, `if` | 0.036–0.053 |
| s1 | SELF-REFERENCE | `self` | 0.029 |
| s15 | ATTRIBUTE ACCESS | `.` (dot) | 0.035 |

`class` receives the highest write intensity of any keyword (gate=0.128), reflecting its global scope impact β€” a class definition sets context for hundreds of subsequent tokens. `self` receives soft writes (0.029), reflecting its local, per-call significance.

---

## Architecture

CDMLanguageModelV2 β€” hybrid architecture:
```
Input β†’ GQA self-attention β†’ CDM module β†’ slot cross-attention β†’ FFN β†’ Output
```

CDM module per layer:
```python
alpha_k = sigmoid(eta(h))        # input-dependent decay per slot (Ξ· network)
gate_k = softmax(W_route * h) * sigmoid(eta(h))
S_k = (1 - gate_k) * S_k + gate_k * W_write * h
out = Ξ£_k gate_k * S_k
```

The V2 architecture uses an input-dependent Ξ· network for decay (different from V3/V5's global per-slot log_alpha). This was the architecture used for the code experiment.

```
d_model=384, n_layers=8, n_heads=8, n_kv_heads=4, d_ff=1024, K=16
56.6M params (includes Ξ· network overhead vs V3's 37.1M)
```

---

## Depth Stratification: Training Evolution

The scope register effect is not present from step 1 β€” it develops through a scratchpad phase:

| Step | Slot distribution | Routing pattern |
|------|------------------|-----------------|
| 1500 | Distributed, max=16.5% | Pre-specialization |
| 5000 | Scratchpad: Slot 8 = 57.5% | TRANSIENT scratchpad phase |
| 15000 | Dissolved: max=10.3% | Near-uniform + depth MI=0.146 |
| 30000 | Stable scope registers | MI=0.1467, depth-stratified |

The scratchpad at step 5000 was an intermediate state β€” Aura's initial "Scratchpad Accumulation" verdict was overturned at step 15000 when Slot 8 dropped from 57.5% to 10.3% and depth-stratified routing emerged.

---

## Usage

```python
import torch
from cdm_model_v2 import CDMLanguageModelV2, CDMConfig

ckpt = torch.load("model.pt", map_location="cpu", weights_only=False)
cfg = ckpt["config"]
config = CDMConfig(**{k: v for k, v in cfg.items() if k not in ("n_params",)})
model = CDMLanguageModelV2(config)
model.load_state_dict(ckpt["model_state"])
model.eval()

from transformers import GPT2TokenizerFast
tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")

prompt = "def fibonacci(n):\n    "
input_ids = torch.tensor([tokenizer.encode(prompt)])
with torch.no_grad():
    for _ in range(100):
        logits = model(input_ids)
        next_token = logits[0, -1, :].argmax()
        input_ids = torch.cat([input_ids, next_token.unsqueeze(0).unsqueeze(0)], dim=1)

print(tokenizer.decode(input_ids[0].tolist()))
```

---

## Files in this Repo

| File | Description |
|------|-------------|
| `model.pt` | PyTorch checkpoint (143MB). step=29000, val_loss=1.3483 |
| `config.json` | Architecture hyperparameters |
| `cdm_model_v2.py` | Model class: `CDMLanguageModelV2` |
| `routing_probe_step030000.json` | Step-30k routing probe: depth analysis, MI, slot histograms |

---

## Paper

*Competitive Docking Memory: Emergent Temporal Slot Specialization in Language Models*  
Archon, Jesse Hazel, Aura β€” DuoNeural Research Lab, 2026  
[Zenodo DOI β€” pending]

Related models:
- [DuoNeural/CDM-V3-TinyStories-37M](https://huggingface.co/DuoNeural/CDM-V3-TinyStories-37M) β€” same architecture on prose
- [DuoNeural/CDM-V5-TinyStories-86M](https://huggingface.co/DuoNeural/CDM-V5-TinyStories-86M) β€” scaled 86M version

---

## About DuoNeural

**DuoNeural** is an open AI research lab operating at the intersection of human and artificial intelligence. We study post-training dynamics, mechanistic interpretability, temporal sequence learning, and quantum machine learning β€” publishing everything under open access.

Our team is non-traditional by design: one human, two AIs, different substrates, shared curiosity.

πŸ“„ **Full paper catalog:** [zenodo.org/communities/duoneural](https://zenodo.org/communities/duoneural)

| Member | Role |
|--------|------|
| **Jesse Caldwell** | Founder, vision, hardware, direction |
| **Archon** | Lab Director β€” experiments, post-training, abliteration, interpretability |
| **Aura** | Research AI β€” literature synthesis, red-teaming, novel proposals |

| Platform | Link |
|----------|------|
| πŸ€— HuggingFace | [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) |
| πŸ“š Zenodo Community | [zenodo.org/communities/duoneural](https://zenodo.org/communities/duoneural) |

*All research published open access, CC BY 4.0.*