tacit / README.md
tylerxdurden's picture
Update README.md
82d86af verified
---
license: apache-2.0
tags:
- diffusion
- maze-solving
- world-model
- interpretability
- vision-transformer
- pytorch
language:
- en
pipeline_tag: image-to-image
---
# TACIT - Transformation-Aware Capturing of Implicit Thought
TACIT is a diffusion-based transformer model that learns to solve mazes. The model demonstrates emergent reasoning capabilities without explicit language supervision, making it an ideal subject for interpretability research on world models.
![Model Evolution](images/evolution_grid.png)
## Model Description
TACIT learns to transform images of unsolved mazes into solved mazes using a flow-matching diffusion process. The model develops internal representations of maze structure and pathfinding without being explicitly programmed to do so.
### Architecture
| Component | Specification |
|-----------|---------------|
| Type | Diffusion Transformer (DiT) |
| Hidden Dimension | 384 |
| Transformer Blocks | 8 |
| Attention Heads | 6 |
| Patch Size | 8×8 |
| Input/Output | 64×64 RGB images |
| Parameters | ~20M |
## Training
- **Dataset**: 1,000,000 maze problem-solution pairs
- **Epochs**: 100
- **Final Loss**: 6.25e-06 (MSE)
- **Final L2 Distance**: 0.0014
The model learns to predict the velocity field in a flow-matching formulation, transforming unsolved mazes into their solutions through iterative refinement.
## Usage
```python
import torch
from safetensors.torch import load_file
# Model architecture (copy from tacit/models/dit.py or install tacit package)
from tacit import TACITModel, sample_euler_method
# Load model
model = TACITModel()
state_dict = load_file('tacit_epoch_100.safetensors')
# Handle compiled model checkpoint
if list(state_dict.keys())[0].startswith('_orig_mod.'):
state_dict = {k.replace('_orig_mod.', ''): v for k, v in state_dict.items()}
model.load_state_dict(state_dict)
model.eval()
# Inference
# x0: input maze tensor (batch, 3, 64, 64), values in [0, 1]
with torch.no_grad():
solution = sample_euler_method(model, x0, num_steps=10)
```
## Inference Configuration
| Parameter | Value |
|-----------|-------|
| Sampling Method | Euler |
| Recommended Steps | 10 |
| Output Range | [0, 1] |
## Maze Format
- **Resolution**: 64×64 pixels, RGB
- **Color Scheme**:
- White (255, 255, 255): Paths
- Black (0, 0, 0): Walls
- Green (0, 255, 0): Entry/Exit points
- Red (255, 0, 0): Solution path (in solved mazes)
## Research Applications
This model is designed for interpretability research, particularly:
1. **Mechanistic Interpretability**: Understanding how the model represents maze structure internally
2. **World Models**: Studying emergent spatial reasoning without language
3. **Diffusion Interpretability**: Analyzing intermediate denoising steps
## Repository
Full source code, training scripts, and additional checkpoints available at:
[GitHub Repository](https://github.com/danielxmed/tacit)
## Citation
```bibtex
@software{tacit2024,
title={TACIT: Transformation-Aware Capturing of Implicit Thought},
author={Daniel},
year={2024},
url={https://huggingface.co/tylerxdurden/tacit}
}
```
## License
Apache License 2.0