vincentoh
/

ctm-experiments

Model card Files Files and versions

xet

Community

vincentoh commited on Dec 1, 2025

Commit

7f14743

verified ·

1 Parent(s): 6fcf7e1

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +168 -0

README.md ADDED Viewed

	@@ -0,0 +1,168 @@

+# CTM Experiments - Continuous Thought Machine Models
+Experimental checkpoints trained on the [Continuous Thought Machine](https://github.com/SakanaAI/continuous-thought-machines) architecture by Sakana AI.
+**These are community experiments on the original work - not official SakanaAI models.**
+## Paper Reference
+> **Continuous Thought Machines**
+>
+> Sakana AI
+>
+> [arXiv:2505.05522](https://arxiv.org/abs/2505.05522)
+>
+> [Interactive Demo](https://pub.sakana.ai/ctm/) | [Blog Post](https://sakana.ai/ctm/)
+```bibtex
+@article{sakana2025ctm,
+  title={Continuous Thought Machines},
+  author={Sakana AI},
+  journal={arXiv preprint arXiv:2505.05522},
+  year={2025}
+}
+```
+## Core Insight
+CTM's key innovation: **accuracy improves with more internal iterations**. The model "thinks longer" to reach better answers. This enables CTM to learn algorithmic reasoning that feedforward networks struggle with.
+## Models
+| Model | File | Size | Task | Accuracy | Description |
+|-------|------|------|------|----------|-------------|
+| MNIST | `ctm-mnist.pt` | 1.3M | Digit classification | 97.9% | 10-class MNIST |
+| Parity-16 | `ctm-parity-16.pt` | 2.5M | Cumulative parity | 99.0% | 16-bit sequences |
+| Parity-64 | `ctm-parity-64.pt` | 66M | Cumulative parity | 75% | 64-bit sequences |
+| QAMNIST | `ctm-qamnist.pt` | 39M | Multi-step arithmetic | 100% | 3-5 digits, 3-5 ops |
+| Brackets | `ctm-brackets.pt` | 6.1M | Bracket matching | 94.7% | Valid/invalid `(()[])` |
+| Tracking-Quadrant | `ctm-tracking-quadrant.pt` | 6.7M | Motion quadrant | 100% | 4-class prediction |
+| Tracking-Position | `ctm-tracking-position.pt` | 6.7M | Exact position | 93.8% | 256-class (16x16 grid) |
+| Transfer | `ctm-transfer-parity-brackets.pt` | 2.5M | Transfer learning | 94.5% | Parity core to brackets |
+## Model Configurations
+### MNIST CTM
+```python
+config = {
+    "iterations": 15,
+    "memory_length": 10,
+    "d_model": 128,
+    "d_input": 128,
+    "heads": 2,
+    "n_synch_out": 16,
+    "n_synch_action": 16,
+    "memory_hidden_dims": 8,
+    "out_dims": 10,
+    "synapse_depth": 1,
+}
+```
+### Parity-16 CTM
+```python
+config = {
+    "iterations": 50,
+    "memory_length": 25,
+    "d_model": 256,
+    "d_input": 32,
+    "heads": 8,
+    "synapse_depth": 8,
+    "out_dims": 16,  # cumulative parity
+}
+```
+### QAMNIST CTM
+```python
+config = {
+    "iterations": 10,
+    "memory_length": 30,
+    "d_model": 1024,
+    "d_input": 64,
+    "synapse_depth": 1,
+    "heads": 4,
+    "n_synch_out": 32,
+    "n_synch_action": 32,
+}
+```
+### Brackets CTM
+```python
+config = {
+    "iterations": 30,
+    "memory_length": 15,
+    "d_model": 256,
+    "d_input": 64,
+    "heads": 4,
+    "n_synch_out": 32,
+    "n_synch_action": 32,
+    "out_dims": 2,  # valid/invalid
+}
+```
+### Tracking CTM
+```python
+config = {
+    "iterations": 20,
+    "memory_length": 15,
+    "d_model": 256,
+    "d_input": 64,
+    "heads": 4,
+    "n_synch_out": 32,
+    "n_synch_action": 32,
+}
+```
+## Usage
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# Download model
+model_path = hf_hub_download(
+    repo_id="vincentoh/ctm-experiments",
+    filename="ctm-mnist.pt"
+)
+# Load checkpoint
+checkpoint = torch.load(model_path, map_location="cpu")
+# Initialize CTM with matching config
+from models.ctm import ContinuousThoughtMachine
+model = ContinuousThoughtMachine(**config)
+model.load_state_dict(checkpoint['model_state_dict'])
+model.eval()
+# Inference
+with torch.no_grad():
+    output = model(input_tensor)
+```
+## Training Details
+- **Hardware**: NVIDIA RTX 4070 Ti SUPER
+- **Framework**: PyTorch
+- **Optimizer**: AdamW
+- **Training time**: 5 minutes (MNIST) to 17 hours (QAMNIST)
+## Key Findings
+1. **Architecture > Scale**: Small sync dimensions (32) with linear synapses work better than large/deep variants
+2. **"Thinking Longer" = Higher Accuracy**: CTM accuracy improves with more internal iterations
+3. **Transfer Learning Works**: Parity-trained core transfers to brackets with 94.5% accuracy
+## License
+MIT License (same as original CTM repository)
+## Acknowledgments
+- [Sakana AI](https://sakana.ai/) for the Continuous Thought Machine architecture
+- Original [CTM Repository](https://github.com/SakanaAI/continuous-thought-machines)
+## Links
+- [Experiment Repository](https://github.com/bigsnarfdude/ctm-experiments)
+- [Original Paper](https://arxiv.org/abs/2505.05522)
+- [Interactive Demo](https://pub.sakana.ai/ctm/)