File size: 3,164 Bytes
30a1425 82d86af 30a1425 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 | ---
license: apache-2.0
tags:
- diffusion
- maze-solving
- world-model
- interpretability
- vision-transformer
- pytorch
language:
- en
pipeline_tag: image-to-image
---
# TACIT - Transformation-Aware Capturing of Implicit Thought
TACIT is a diffusion-based transformer model that learns to solve mazes. The model demonstrates emergent reasoning capabilities without explicit language supervision, making it an ideal subject for interpretability research on world models.

## Model Description
TACIT learns to transform images of unsolved mazes into solved mazes using a flow-matching diffusion process. The model develops internal representations of maze structure and pathfinding without being explicitly programmed to do so.
### Architecture
| Component | Specification |
|-----------|---------------|
| Type | Diffusion Transformer (DiT) |
| Hidden Dimension | 384 |
| Transformer Blocks | 8 |
| Attention Heads | 6 |
| Patch Size | 8×8 |
| Input/Output | 64×64 RGB images |
| Parameters | ~20M |
## Training
- **Dataset**: 1,000,000 maze problem-solution pairs
- **Epochs**: 100
- **Final Loss**: 6.25e-06 (MSE)
- **Final L2 Distance**: 0.0014
The model learns to predict the velocity field in a flow-matching formulation, transforming unsolved mazes into their solutions through iterative refinement.
## Usage
```python
import torch
from safetensors.torch import load_file
# Model architecture (copy from tacit/models/dit.py or install tacit package)
from tacit import TACITModel, sample_euler_method
# Load model
model = TACITModel()
state_dict = load_file('tacit_epoch_100.safetensors')
# Handle compiled model checkpoint
if list(state_dict.keys())[0].startswith('_orig_mod.'):
state_dict = {k.replace('_orig_mod.', ''): v for k, v in state_dict.items()}
model.load_state_dict(state_dict)
model.eval()
# Inference
# x0: input maze tensor (batch, 3, 64, 64), values in [0, 1]
with torch.no_grad():
solution = sample_euler_method(model, x0, num_steps=10)
```
## Inference Configuration
| Parameter | Value |
|-----------|-------|
| Sampling Method | Euler |
| Recommended Steps | 10 |
| Output Range | [0, 1] |
## Maze Format
- **Resolution**: 64×64 pixels, RGB
- **Color Scheme**:
- White (255, 255, 255): Paths
- Black (0, 0, 0): Walls
- Green (0, 255, 0): Entry/Exit points
- Red (255, 0, 0): Solution path (in solved mazes)
## Research Applications
This model is designed for interpretability research, particularly:
1. **Mechanistic Interpretability**: Understanding how the model represents maze structure internally
2. **World Models**: Studying emergent spatial reasoning without language
3. **Diffusion Interpretability**: Analyzing intermediate denoising steps
## Repository
Full source code, training scripts, and additional checkpoints available at:
[GitHub Repository](https://github.com/danielxmed/tacit)
## Citation
```bibtex
@software{tacit2024,
title={TACIT: Transformation-Aware Capturing of Implicit Thought},
author={Daniel},
year={2024},
url={https://huggingface.co/tylerxdurden/tacit}
}
```
## License
Apache License 2.0
|