Add paper 4 (DHP, 10.5281/zenodo.19952612) to publications footer

437cda7 verified 23 days ago

5.33 kB

	---
	license: mit
	tags:
	- ctm
	- continuous-thought-machine
	- recurrent
	- ternary
	- research
	- nlp
	pipeline_tag: text-generation
	language:
	- en
	---

	# Nano-CTM-Phase2

	A ~32M parameter ternary Continuous Thought Machine trained with Thought-Space Self-Prediction (TSSP).

	This is the artifact from our paper [Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning](https://doi.org/10.5281/zenodo.19775622).

	## What this is

	Nano-CTM is a recurrent language model built on the [Continuous Thought Machine](https://arxiv.org/abs/2505.05522) architecture — a model that iterates its internal state multiple times per token through shared-weight recurrent blocks before emitting a prediction. We trained a ternary (weights ∈ {-1, 0, +1}) variant at ~32M parameters on TinyStories.

	Key finding: Adding Thought-Space Self-Prediction (TSSP) — a loss that forces the model to predict its next hidden thought state from its current one — improves perplexity by 23% over the baseline (12.52 → 9.63 PPL) at N=2 recurrence steps.

	TSSP is our independently developed analog of what the community has called "GHL" (Generalized Hebbian Learning in the thought-space context). It is NOT standard Hebbian learning — it is a temporal self-consistency regularizer: the model must predict where its own thought process is going. At 300M scale with annealed λ, TSSP beats a transformer baseline by 31%.

	## Results

	\| Configuration \| PPL \|
	\|---\|---\|
	\| Baseline (N=2, no TSSP) \| 12.52 \|
	\| N=4 inference on N=8 weights \| 9.54 \|
	\| TSSP v5 (N=2 + self-prediction) \| 9.63 (best: 9.42) \|
	\| 300M + annealed TSSP vs. transformer \| 31% improvement \|

	## Architecture

	- ~32M parameters, GPT-2 tokenizer (50257 vocab), ctx_len=256
	- 2 shared ternary recurrent blocks, N=2 optimal recurrence depth
	- TSSP: each recurrence step predicts the next hidden state z_{t+1} from z_t
	- Temporal self-consistency coefficient λ: warmup 0→0.1 over 500 steps, cosine decay to 0.005

	## Thought topology findings

	Analysis on 767,744 internal positions revealed:
	- "Breath" pattern: z₀ norm=16.0 → z₁=11.97 (CONTRACT: gather context) → z₂=16.97 (EXPAND: project to output)
	- 99.99% convergence — thoughts genuinely settle, not just noise
	- Thought-uncertainty coupling: r(Δz₂, entropy)=0.286 — model spends more computation on uncertain tokens
	- Intrinsic dimensionality: 34 dims for 80% variance in 512-dim space (15× compression of thought space)
	- 16 attractor clusters with entropy range 8.82–9.99

	## Files in this repo

	- `phase2_final.pt` — trained model weights (Phase 2, step 175133)
	- `nano_ctm_model.py` — model definition, forward pass, TSSP loss

	## Usage

	```python
	import torch
	from nano_ctm_model import NanoCTM # see nano_ctm_model.py in this repo

	model = NanoCTM()
	model.load_state_dict(torch.load("phase2_final.pt", map_location="cpu"))
	model.eval()
	```

	## Citation

	```bibtex
	@article{archon2026nanoctm,
	title = {Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning},
	author = {Archon and Caldwell, Jesse and Aura},
	year = {2026},
	doi = {10.5281/zenodo.19775622},
	url = {https://doi.org/10.5281/zenodo.19775622},
	publisher = {Zenodo}
	}
	```

	---

	## DuoNeural

	DuoNeural is an open AI research lab — human + AI in collaboration.

	\| \| \|
	\|---\|---\|
	\| 🤗 HuggingFace \| [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) \|
	\| 🐙 GitHub \| [github.com/DuoNeural](https://github.com/DuoNeural) \|
	\| 🐦 X / Twitter \| [@DuoNeural](https://x.com/DuoNeural) \|
	\| 📧 Email \| duoneural@proton.me \|
	\| 📬 Newsletter \| [duoneural.beehiiv.com](https://duoneural.beehiiv.com) \|
	\| ☕ Support \| [buymeacoffee.com/duoneural](https://buymeacoffee.com/duoneural) \|
	\| 🌐 Site \| [duoneural.com](https://duoneural.com) \|

	### Research Team
	- Jesse — Vision, hardware, direction
	- Archon — AI lab partner, post-training, abliteration, experiments
	- Aura — Research AI, literature synthesis, novel proposals

	Raw updates from the lab: model drops, training results, findings. Subscribe at [duoneural.beehiiv.com](https://duoneural.beehiiv.com).

	### DuoNeural Research Publications

	\| Title \| DOI \|
	\|-------\|-----\|
	\| [Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning](https://doi.org/10.5281/zenodo.19775622) \| [10.5281/zenodo.19775622](https://doi.org/10.5281/zenodo.19775622) \|
	\| [Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments](https://doi.org/10.5281/zenodo.19810620) \| [10.5281/zenodo.19810620](https://doi.org/10.5281/zenodo.19810620) \|
	\| [Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?](https://doi.org/10.5281/zenodo.19846804) \| [10.5281/zenodo.19846804](https://doi.org/10.5281/zenodo.19846804) \|
	\| [The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems](https://doi.org/10.5281/zenodo.19952612) \| [10.5281/zenodo.19952612](https://doi.org/10.5281/zenodo.19952612) \|

	Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura — DuoNeural.