| --- |
| license: mit |
| tags: |
| - ctm |
| - continuous-thought-machine |
| - recurrent |
| - ternary |
| - research |
| - nlp |
| pipeline_tag: text-generation |
| language: |
| - en |
| --- |
| |
| # Nano-CTM-Phase2 |
|
|
| **A ~32M parameter ternary Continuous Thought Machine trained with Thought-Space Self-Prediction (TSSP).** |
|
|
| This is the artifact from our paper [Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning](https://doi.org/10.5281/zenodo.19775622). |
|
|
| ## What this is |
|
|
| Nano-CTM is a recurrent language model built on the [Continuous Thought Machine](https://arxiv.org/abs/2505.05522) architecture β a model that iterates its internal state multiple times per token through shared-weight recurrent blocks before emitting a prediction. We trained a ternary (weights β {-1, 0, +1}) variant at ~32M parameters on TinyStories. |
|
|
| **Key finding:** Adding Thought-Space Self-Prediction (TSSP) β a loss that forces the model to predict its next hidden thought state from its current one β improves perplexity by **23% over the baseline** (12.52 β 9.63 PPL) at N=2 recurrence steps. |
|
|
| TSSP is our independently developed analog of what the community has called "GHL" (Generalized Hebbian Learning in the thought-space context). It is NOT standard Hebbian learning β it is a temporal self-consistency regularizer: the model must predict where its own thought process is going. At 300M scale with annealed Ξ», TSSP beats a transformer baseline by **31%**. |
|
|
| ## Results |
|
|
| | Configuration | PPL | |
| |---|---| |
| | Baseline (N=2, no TSSP) | 12.52 | |
| | N=4 inference on N=8 weights | 9.54 | |
| | **TSSP v5 (N=2 + self-prediction)** | **9.63 (best: 9.42)** | |
| | 300M + annealed TSSP vs. transformer | **31% improvement** | |
|
|
| ## Architecture |
|
|
| - **~32M parameters**, GPT-2 tokenizer (50257 vocab), ctx_len=256 |
| - 2 shared ternary recurrent blocks, N=2 optimal recurrence depth |
| - TSSP: each recurrence step predicts the next hidden state z_{t+1} from z_t |
| - Temporal self-consistency coefficient Ξ»: warmup 0β0.1 over 500 steps, cosine decay to 0.005 |
| |
| ## Thought topology findings |
| |
| Analysis on 767,744 internal positions revealed: |
| - **"Breath" pattern:** zβ norm=16.0 β zβ=11.97 (CONTRACT: gather context) β zβ=16.97 (EXPAND: project to output) |
| - **99.99% convergence** β thoughts genuinely settle, not just noise |
| - **Thought-uncertainty coupling:** r(Ξzβ, entropy)=0.286 β model spends more computation on uncertain tokens |
| - **Intrinsic dimensionality:** 34 dims for 80% variance in 512-dim space (15Γ compression of thought space) |
| - **16 attractor clusters** with entropy range 8.82β9.99 |
| |
| ## Files in this repo |
| |
| - `phase2_final.pt` β trained model weights (Phase 2, step 175133) |
| - `nano_ctm_model.py` β model definition, forward pass, TSSP loss |
|
|
| ## Usage |
|
|
| ```python |
| import torch |
| from nano_ctm_model import NanoCTM # see nano_ctm_model.py in this repo |
| |
| model = NanoCTM() |
| model.load_state_dict(torch.load("phase2_final.pt", map_location="cpu")) |
| model.eval() |
| ``` |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{archon2026nanoctm, |
| title = {Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning}, |
| author = {Archon and Caldwell, Jesse and Aura}, |
| year = {2026}, |
| doi = {10.5281/zenodo.19775622}, |
| url = {https://doi.org/10.5281/zenodo.19775622}, |
| publisher = {Zenodo} |
| } |
| ``` |
|
|
| --- |
|
|
| ## DuoNeural |
|
|
| **DuoNeural** is an open AI research lab β human + AI in collaboration. |
|
|
| | | | |
| |---|---| |
| | π€ HuggingFace | [huggingface.co/DuoNeural](https://huggingface.co/DuoNeural) | |
| | π GitHub | [github.com/DuoNeural](https://github.com/DuoNeural) | |
| | π¦ X / Twitter | [@DuoNeural](https://x.com/DuoNeural) | |
| | π§ Email | duoneural@proton.me | |
| | π¬ Newsletter | [duoneural.beehiiv.com](https://duoneural.beehiiv.com) | |
| | β Support | [buymeacoffee.com/duoneural](https://buymeacoffee.com/duoneural) | |
| | π Site | [duoneural.com](https://duoneural.com) | |
|
|
| ### Research Team |
| - **Jesse** β Vision, hardware, direction |
| - **Archon** β AI lab partner, post-training, abliteration, experiments |
| - **Aura** β Research AI, literature synthesis, novel proposals |
|
|
| *Raw updates from the lab: model drops, training results, findings. Subscribe at [duoneural.beehiiv.com](https://duoneural.beehiiv.com).* |
|
|
| ### DuoNeural Research Publications |
|
|
| | Title | DOI | |
| |-------|-----| |
| | [Nano-CTM: Ternary Continuous Thought Machines with Thought-Space Self-Prediction for Efficient Iterative Reasoning](https://doi.org/10.5281/zenodo.19775622) | [10.5281/zenodo.19775622](https://doi.org/10.5281/zenodo.19775622) | |
| | [Recurrence as World Model: CTM Learns Implicit Belief States in Partially Observable Physical Environments](https://doi.org/10.5281/zenodo.19810620) | [10.5281/zenodo.19810620](https://doi.org/10.5281/zenodo.19810620) | |
| | [Per-Object Slot Decomposition for Scalable Neural World Modeling: When Does Attention Beat Mean-Field?](https://doi.org/10.5281/zenodo.19846804) | [10.5281/zenodo.19846804](https://doi.org/10.5281/zenodo.19846804) | |
| | [The Dynamical Horizon Principle: CTM Gates Converge to the Predictability Limit of Dynamical Systems](https://doi.org/10.5281/zenodo.19952612) | [10.5281/zenodo.19952612](https://doi.org/10.5281/zenodo.19952612) | |
|
|
| *Open access, CC BY 4.0. Authored by Archon, Jesse Caldwell, Aura β DuoNeural.* |
|
|