File size: 5,332 Bytes
23b7cbb
 
 
 
 
 
 
 
 
 
 
 
 
 
9605f1e
 
23b7cbb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9605f1e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3a4fcb2
 
 
 
 
 
 
 
437cda7
3a4fcb2
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
---
license: mit
tags:
- ctm
- continuous-thought-machine
- recurrent
- ternary
- research
- nlp
pipeline_tag: text-generation
language:
- en
---

# Nano-CTM-Phase2

**A ~32M parameter ternary Continuous Thought Machine trained with Thought-Space Self-Prediction (TSSP).**

This is the artifact from our paper [Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning](https://doi.org/10.5281/zenodo.19775622).

## What this is

Nano-CTM is a recurrent language model built on the [Continuous Thought Machine](https://arxiv.org/abs/2505.05522) architecture β€” a model that iterates its internal state multiple times per token through shared-weight recurrent blocks before emitting a prediction. We trained a ternary (weights ∈ {-1, 0, +1}) variant at ~32M parameters on TinyStories.

**Key finding:** Adding Thought-Space Self-Prediction (TSSP) β€” a loss that forces the model to predict its next hidden thought state from its current one β€” improves perplexity by **23% over the baseline** (12.52 β†’ 9.63 PPL) at N=2 recurrence steps.

TSSP is our independently developed analog of what the community has called "GHL" (Generalized Hebbian Learning in the thought-space context). It is NOT standard Hebbian learning β€” it is a temporal self-consistency regularizer: the model must predict where its own thought process is going. At 300M scale with annealed Ξ», TSSP beats a transformer baseline by **31%**.

## Results

| Configuration | PPL |
|---|---|
| Baseline (N=2, no TSSP) | 12.52 |
| N=4 inference on N=8 weights | 9.54 |
| **TSSP v5 (N=2 + self-prediction)** | **9.63 (best: 9.42)** |
| 300M + annealed TSSP vs. transformer | **31% improvement** |

## Architecture

- **~32M parameters**, GPT-2 tokenizer (50257 vocab), ctx_len=256
- 2 shared ternary recurrent blocks, N=2 optimal recurrence depth
- TSSP: each recurrence step predicts the next hidden state z_{t+1} from z_t
- Temporal self-consistency coefficient Ξ»: warmup 0β†’0.1 over 500 steps, cosine decay to 0.005

## Thought topology findings

Analysis on 767,744 internal positions revealed:
- **"Breath" pattern:** zβ‚€ norm=16.0 β†’ z₁=11.97 (CONTRACT: gather context) β†’ zβ‚‚=16.97 (EXPAND: project to output)
- **99.99% convergence** β€” thoughts genuinely settle, not just noise
- **Thought-uncertainty coupling:** r(Ξ”zβ‚‚, entropy)=0.286 β€” model spends more computation on uncertain tokens
- **Intrinsic dimensionality:** 34 dims for 80% variance in 512-dim space (15Γ— compression of thought space)
- **16 attractor clusters** with entropy range 8.82–9.99

## Files in this repo

- `phase2_final.pt` β€” trained model weights (Phase 2, step 175133)
- `nano_ctm_model.py` β€” model definition, forward pass, TSSP loss

## Usage

```python
import torch
from nano_ctm_model import NanoCTM  # see nano_ctm_model.py in this repo

model = NanoCTM()
model.load_state_dict(torch.load("phase2_final.pt", map_location="cpu"))
model.eval()
```

## Citation

```bibtex
@article{archon2026nanoctm,
  title     = {Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning},
  author    = {Archon and Caldwell, Jesse and Aura},
  year      = {2026},
  doi       = {10.5281/zenodo.19775622},
  url       = {https://doi.org/10.5281/zenodo.19775622},
  publisher = {Zenodo}
}
```

---

## DuoNeural

**DuoNeural** is an open AI research lab β€” human + AI in collaboration.

| | |
|---|---|
| πŸ€— HuggingFace | [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) |
| πŸ™ GitHub | [github.com/DuoNeural](https://github.com/DuoNeural) |
| 🐦 X / Twitter | [@DuoNeural](https://x.com/DuoNeural) |
| πŸ“§ Email | duoneural@proton.me |
| πŸ“¬ Newsletter | [duoneural.beehiiv.com](https://duoneural.beehiiv.com) |
| β˜• Support | [buymeacoffee.com/duoneural](https://buymeacoffee.com/duoneural) |
| 🌐 Site | [duoneural.com](https://duoneural.com) |

### Research Team
- **Jesse** β€” Vision, hardware, direction
- **Archon** β€” AI lab partner, post-training, abliteration, experiments
- **Aura** β€” Research AI, literature synthesis, novel proposals

*Raw updates from the lab: model drops, training results, findings. Subscribe at [duoneural.beehiiv.com](https://duoneural.beehiiv.com).*

### DuoNeural Research Publications

| Title | DOI |
|-------|-----|
| [Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning](https://doi.org/10.5281/zenodo.19775622) | [10.5281/zenodo.19775622](https://doi.org/10.5281/zenodo.19775622) |
| [Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments](https://doi.org/10.5281/zenodo.19810620) | [10.5281/zenodo.19810620](https://doi.org/10.5281/zenodo.19810620) |
| [Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?](https://doi.org/10.5281/zenodo.19846804) | [10.5281/zenodo.19846804](https://doi.org/10.5281/zenodo.19846804) |
| [The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems](https://doi.org/10.5281/zenodo.19952612) | [10.5281/zenodo.19952612](https://doi.org/10.5281/zenodo.19952612) |

*Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura β€” DuoNeural.*