vincentoh
/

ctm-experiments

Model card Files Files and versions

xet

Community

vincentoh commited on Dec 1, 2025

Commit

8244b6b

verified ·

1 Parent(s): 7ada1c6

Upload README.md with huggingface_hub

Browse files

Files changed (1) hide show

README.md +167 -172

README.md CHANGED Viewed

@@ -1,202 +1,197 @@
-# CTM Experiments - Continuous Thought Machine Models
-Experimental checkpoints trained on the [Continuous Thought Machine](https://github.com/SakanaAI/continuous-thought-machines) architecture by Sakana AI.
-**These are community experiments on the original work - not official SakanaAI models.**
-## Paper Reference
-> **Continuous Thought Machines**
->
-> Sakana AI
->
-> [arXiv:2505.05522](https://arxiv.org/abs/2505.05522)
->
-> [Interactive Demo](https://pub.sakana.ai/ctm/) | [Blog Post](https://sakana.ai/ctm/)
-```bibtex
-@article{sakana2025ctm,
-  title={Continuous Thought Machines},
-  author={Sakana AI},
-  journal={arXiv preprint arXiv:2505.05522},
-  year={2025}
-}
-```
-## Core Insight
-CTM's key innovation: **accuracy improves with more internal iterations**. The model "thinks longer" to reach better answers. This enables CTM to learn algorithmic reasoning that feedforward networks struggle with.
-## Models
-| Model | File | Size | Task | Accuracy | Description |
-|-------|------|------|------|----------|-------------|
-| MNIST | `ctm-mnist.pt` | 1.3M | Digit classification | 97.9% | 10-class MNIST |
-| Parity-16 | `ctm-parity-16.pt` | 2.5M | Cumulative parity | 99.0% | 16-bit sequences |
-| Parity-64 | `ctm-parity-64.pt` | 66M | Cumulative parity | 75% | 64-bit sequences |
-| QAMNIST | `ctm-qamnist.pt` | 39M | Multi-step arithmetic | 100% | 3-5 digits, 3-5 ops |
-| Brackets | `ctm-brackets.pt` | 6.1M | Bracket matching | 94.7% | Valid/invalid `(()[])` |
-| Tracking-Quadrant | `ctm-tracking-quadrant.pt` | 6.7M | Motion quadrant | 100% | 4-class prediction |
-| Tracking-Position | `ctm-tracking-position.pt` | 6.7M | Exact position | 93.8% | 256-class (16x16 grid) |
-| Transfer | `ctm-transfer-parity-brackets.pt` | 2.5M | Transfer learning | 94.5% | Parity core to brackets |
-| Jigsaw MNIST | `ctm-jigsaw-mnist.pt` | 19M | Jigsaw puzzle solving | 92.3% | Reassemble 2x2 shuffled MNIST |
-| Rotation MNIST | `ctm-rotation-mnist.pt` | 4.2M | Rotation prediction | 89.1% | Predict rotation angle (4 classes) |
-## Model Configurations
-### MNIST CTM
-```python
-config = {
-    "iterations": 15,
-    "memory_length": 10,
-    "d_model": 128,
-    "d_input": 128,
-    "heads": 2,
-    "n_synch_out": 16,
-    "n_synch_action": 16,
-    "memory_hidden_dims": 8,
-    "out_dims": 10,
-    "synapse_depth": 1,
-}
-```
-### Parity-16 CTM
-```python
-config = {
-    "iterations": 50,
-    "memory_length": 25,
-    "d_model": 256,
-    "d_input": 32,
-    "heads": 8,
-    "synapse_depth": 8,
-    "out_dims": 16,  # cumulative parity
-}
-```
-### QAMNIST CTM
-```python
-config = {
-    "iterations": 10,
-    "memory_length": 30,
-    "d_model": 1024,
-    "d_input": 64,
-    "synapse_depth": 1,
-    "heads": 4,
-    "n_synch_out": 32,
-    "n_synch_action": 32,
-}
-```
-### Brackets CTM
-```python
-config = {
-    "iterations": 30,
-    "memory_length": 15,
-    "d_model": 256,
-    "d_input": 64,
-    "heads": 4,
-    "n_synch_out": 32,
-    "n_synch_action": 32,
-    "out_dims": 2,  # valid/invalid
-}
-```
-### Tracking CTM
-```python
-config = {
-    "iterations": 20,
-    "memory_length": 15,
-    "d_model": 256,
-    "d_input": 64,
-    "heads": 4,
-    "n_synch_out": 32,
-    "n_synch_action": 32,
-}
-```
-### Jigsaw MNIST CTM
-```python
-config = {
-    "iterations": 30,
-    "memory_length": 20,
-    "d_model": 512,
-    "d_input": 128,
-    "heads": 8,
-    "n_synch_out": 32,
-    "n_synch_action": 32,
-    "synapse_depth": 1,
-    "out_dims": 24,  # 4 tiles x 6 permutation options
-    "backbone_type": "jigsaw",
-}
-```
-### Rotation MNIST CTM
-```python
-config = {
-    "iterations": 20,
-    "memory_length": 15,
-    "d_model": 256,
-    "d_input": 64,
-    "heads": 4,
-    "n_synch_out": 32,
-    "n_synch_action": 32,
-    "synapse_depth": 1,
-    "out_dims": 4,  # 0°, 90°, 180°, 270°
-    "backbone_type": "rotation",
-}
-```
-## Usage
-```python
-import torch
-from huggingface_hub import hf_hub_download
-# Download model
-model_path = hf_hub_download(
-    repo_id="vincentoh/ctm-experiments",
-    filename="ctm-mnist.pt"
-)
-# Load checkpoint
-checkpoint = torch.load(model_path, map_location="cpu")
-# Initialize CTM with matching config
-from models.ctm import ContinuousThoughtMachine
-model = ContinuousThoughtMachine(**config)
-model.load_state_dict(checkpoint['model_state_dict'])
-model.eval()
-# Inference
-with torch.no_grad():
-    output = model(input_tensor)
 ```
-## Training Details
-- **Hardware**: NVIDIA RTX 4070 Ti SUPER
-- **Framework**: PyTorch
-- **Optimizer**: AdamW
-- **Training time**: 5 minutes (MNIST) to 17 hours (QAMNIST)
-## Key Findings
-1. **Architecture > Scale**: Small sync dimensions (32) with linear synapses work better than large/deep variants
-2. **"Thinking Longer" = Higher Accuracy**: CTM accuracy improves with more internal iterations
-3. **Transfer Learning Works**: Parity-trained core transfers to brackets with 94.5% accuracy
-## License
-MIT License (same as original CTM repository)
-## Acknowledgments
-- [Sakana AI](https://sakana.ai/) for the Continuous Thought Machine architecture
-- Original [CTM Repository](https://github.com/SakanaAI/continuous-thought-machines)
-## Links
-- [Experiment Repository](https://github.com/bigsnarfdude/ctm-experiments)
-- [Original Paper](https://arxiv.org/abs/2505.05522)
-- [Interactive Demo](https://pub.sakana.ai/ctm/)

+# CTM Experiments
+Personal experiments with [Continuous Thought Machines](https://github.com/SakanaAI/continuous-thought-machines) (SakanaAI).
+**Interactive Demo**: https://pub.sakana.ai/ctm/
+## Core Insight: Thinking Takes Time
+CTM's key innovation: **accuracy improves with more internal iterations**. The model "thinks longer" to reach better answers.
+This enables CTM to learn algorithmic reasoning that feedforward networks struggle with:
+| Task | Challenge | What CTM Learns |
+|------|-----------|-----------------|
+| **Parity** | Count bits across sequence | Iterative accumulation |
+| **Brackets** | Track nested structure | Stack-like memory (LIFO) |
+| **Object Tracking** | Extrapolate motion | Physics simulation |
+| **Mazes** | Navigate 2D paths | Sequential decision making |
+| **Jigsaw** | Classify shuffled patches | Part-whole integration |
+## Results Summary
+| Experiment | Accuracy | Notes |
+|------------|----------|-------|
+| **MNIST** | **97.9%** | Digit classification, 5 min training |
+| **Parity-16** | **99.0%** | 16-bit cumulative parity |
+| **QAMNIST** | **100%** | Multi-step arithmetic (3-5 digits, 3-5 ops) |
+| **Brackets** | **94.7%** | Stack-like reasoning for `(()[])` vs `([)]` |
+| **Object Tracking** | **100%** | Quadrant prediction from motion (4 classes) |
+| **Velocity Prediction** | **100%** | Direction prediction (9 classes) |
+| **Position Prediction** | **93.8%** | Exact position (256 classes, 16x16 grid) |
+| **Transfer Learning** | **94.5%** | Parity→Brackets (core frozen) |
+| **Maze Solving** | **Visualized** | Pretrained model inference on 15x15 mazes |
+| **Jigsaw MNIST** | **92%** | Classify digits from shuffled patches (no positional encoding) |
+## Key Findings
+### 1. Architecture Matters More Than Scale
+Early experiments showed 50% accuracy on parity (random guessing). The fix wasn't more parameters - it was using the **correct architecture**:
+| Parameter | Wrong | Correct (Official) |
+|-----------|-------|-------------------|
+| `n_synch_out` | 512 | **32** |
+| `n_synch_action` | 512 | **32** |
+| `synapse_depth` | 4 (U-NET) | **1** (linear) |
+The official parity implementation uses surprisingly small synchronization dimensions with a linear synapse - this is critical for learning.
+### 2. "Thinking Longer" = Higher Accuracy
+![MNIST Inference per Tick](continuous-thought-machines/experiments/results/mnist_inference.png)
+CTM accuracy improves with more internal iterations:
+- **Tick 0**: 7% (random)
+- **Tick 10-11**: 100% (peak)
+- **Final tick**: 98%
+Harder tasks need more "thinking time" - parity peaks at tick 35.
+### 3. Transfer Learning Works
+Pretrained parity model transfers to brackets:
+- **Baseline**: 52.5% (random)
+- **After transfer**: 94.5% (core frozen, only backbone/output trained)
+The iterative counting learned for parity transfers to stack tracking for brackets - matching from-scratch performance with only 37.7% of parameters trainable.
+### 4. Maze Solving "The Hard Way"
+CTM solves mazes by outputting action trajectories (Up/Down/Left/Right/Wait), not pixel masks:
+- **Step accuracy**: 60%+ after 2000 iterations
+- Uses auto-extending curriculum (loss only on trajectory up to first error)
+- Demonstrates sequential reasoning capability
+![Maze Attention Overlay](continuous-thought-machines/experiments/results/maze_attention.gif)
+*CTM "thinking" through a 15x15 maze: blue = predicted path, red = attention focus, green = start position. The attention heatmap shows where CTM looks at each internal tick (T=75 iterations).*
+## Detailed Results
+### MNIST Digit Classification (97.9%)
+![MNIST Training Accuracy](continuous-thought-machines/experiments/results/mnist-ctm_smoothed.png)
+CTM learns digit classification in ~5 minutes on RTX 4070 Ti.
+### Parity-16 Cumulative Parity (99.0%)
+![Parity Inference per Tick](continuous-thought-machines/experiments/results/parity_inference.png)
+16-bit parity with cumulative outputs - harder task shows clearer "thinking" benefit.
+### QAMNIST Multi-Step Arithmetic (100%)
+![QAMNIST Training Accuracy](continuous-thought-machines/experiments/results/qamnist-ctm-10_smoothed.png)
+100% accuracy on multi-step arithmetic (3-5 MNIST digits, 3-5 operations) after 300k iterations.
+### Maze Navigation (Pretrained Model)
+Using the authors' pretrained checkpoint (`ctm_mazeslarge_D=2048_T=75_M=25.pt`), we ran inference on the small-mazes dataset:
+- **Model**: D=2048 neurons, T=75 thinking steps, M=25 max trajectory length
+- **Dataset**: 1000 test mazes (15x15 grid)
+- **Output**: Action trajectories (Up/Down/Left/Right/Wait)
+The visualization shows CTM's attention patterns as it navigates:
+1. **Red heatmap**: Where CTM "looks" at each thinking step
+2. **Blue path**: Predicted solution trajectory
+3. **Green marker**: Start position
+Key insight: CTM learns sequential decision-making through iterative internal computation, not memorization.
+### Object Tracking - Position Prediction (93.8%)
+![Position Tracking Training](continuous-thought-machines/experiments/results/tracking_position.png)
+The hardest tracking task: predict exact cell (256 classes) from 5 frames of motion. CTM reaches 93.8% test accuracy, demonstrating temporal reasoning across video frames.
+## Experiment Tracking
+- **Configs**: [`experiments/experiments.json`](continuous-thought-machines/experiments/experiments.json)
+- **Training Scripts**: [`experiments/training/`](continuous-thought-machines/experiments/training/)
+- **Inference Scripts**: [`experiments/inference/`](continuous-thought-machines/experiments/inference/)
+- **Results**: [`experiments/results/`](continuous-thought-machines/experiments/results/)
+## Custom Experiments
+### Bracket Matching
+Classify bracket strings as valid or invalid: `(()[])` vs `([)]`
+Requires tracking nested depth and bracket types - implementing a stack through iterative thinking.
+### Object Tracking
+Predict properties of a moving dot from 5 video frames (16x16 grid).
+```
+Frame 0    Frame 1    Frame 2    Frame 3    Frame 4
+. . . .    . . . .    . . . .    . . . .    . . . .
+. * . .    . . * .    . . . *    . . . .    . . . .
+. . . .    . . . .    . . . .    . . . *    . . . .
+. . . .    . . . .    . . . .    . . . .    . . . *
 ```
+Three prediction tasks tested:
+| Task | Classes | Accuracy | Notes |
+|------|---------|----------|-------|
+| **Quadrant** | 4 | 100% | TL/TR/BL/BR - easiest |
+| **Velocity** | 9 | 100% | 8 directions + stationary |
+| **Position** | 256 | 93.8% | Exact cell (16x16) - hardest |
+All tasks converged, demonstrating CTM's ability to learn temporal/spatial reasoning.
+### Transfer Learning
+Freeze core CTM dynamics from parity-16, train only backbone/output for brackets.
+### Maze Inference
+Run pretrained maze model on small-mazes dataset to visualize CTM's "thinking" process:
+```bash
+python -m tasks.mazes.analysis.run \
+  --actions viz \
+  --checkpoint checkpoints/mazes/ctm_mazeslarge_D=2048_T=75_M=25.pt \
+  --dataset_for_viz small-mazes
+```
+Outputs attention overlay GIFs to `tasks/mazes/analysis/outputs/viz/`.
+### Jigsaw MNIST
+Classify MNIST digits from **randomly shuffled patches** without positional encoding.
+```
+Original:        Shuffled (input):
+┌───┬───┬───┬───┐    ┌───┬───┬───┬───┐
+│ 1 │ 2 │ 3 │ 4 │    │12 │ 7 │ 2 │15 │
+├───┼───┼───┼───┤    ├───┼───┼───┼───┤
+│ 5 │ 6 │ 7 │ 8 │ => │ 4 │11 │ 9 │ 1 │
+├───┼───┼───┼───┤    ├───┼───┼───┼───┤
+│ 9 │10 │11 │12 │    │ 6 │ 3 │14 │ 5 │
+├───┼───┼───┼───┤    ├───┼───┼───┼───┤
+│13 │14 │15 │16 │    │16 │ 8 │10 │13 │
+└───┴───┴───┴───┘    └───┴───┴───┴───┘
+```
+**Task**: Given 16 shuffled 7x7 patches, predict the digit class (0-9).
+**Challenge**: No positional encoding - CTM must learn to recognize digit parts and integrate them correctly through its internal synchronization dynamics.
+**Result**: **92% test accuracy** - CTM successfully learns part-whole relationships without explicit position information.
+![Jigsaw Training](continuous-thought-machines/experiments/results/jigsaw_training.png)
+## Resources
+- [CTM Paper](2505.05522v4.pdf)
+- [Original SakanaAI Repo](https://github.com/SakanaAI/continuous-thought-machines)