|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- diffusion |
|
|
- maze-solving |
|
|
- world-model |
|
|
- interpretability |
|
|
- vision-transformer |
|
|
- pytorch |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: image-to-image |
|
|
--- |
|
|
|
|
|
# TACIT - Transformation-Aware Capturing of Implicit Thought |
|
|
|
|
|
TACIT is a diffusion-based transformer model that learns to solve mazes. The model demonstrates emergent reasoning capabilities without explicit language supervision, making it an ideal subject for interpretability research on world models. |
|
|
|
|
|
 |
|
|
|
|
|
## Model Description |
|
|
|
|
|
TACIT learns to transform images of unsolved mazes into solved mazes using a flow-matching diffusion process. The model develops internal representations of maze structure and pathfinding without being explicitly programmed to do so. |
|
|
|
|
|
### Architecture |
|
|
|
|
|
| Component | Specification | |
|
|
|-----------|---------------| |
|
|
| Type | Diffusion Transformer (DiT) | |
|
|
| Hidden Dimension | 384 | |
|
|
| Transformer Blocks | 8 | |
|
|
| Attention Heads | 6 | |
|
|
| Patch Size | 8×8 | |
|
|
| Input/Output | 64×64 RGB images | |
|
|
| Parameters | ~20M | |
|
|
|
|
|
## Training |
|
|
|
|
|
- **Dataset**: 1,000,000 maze problem-solution pairs |
|
|
- **Epochs**: 100 |
|
|
- **Final Loss**: 6.25e-06 (MSE) |
|
|
- **Final L2 Distance**: 0.0014 |
|
|
|
|
|
The model learns to predict the velocity field in a flow-matching formulation, transforming unsolved mazes into their solutions through iterative refinement. |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from safetensors.torch import load_file |
|
|
|
|
|
# Model architecture (copy from tacit/models/dit.py or install tacit package) |
|
|
from tacit import TACITModel, sample_euler_method |
|
|
|
|
|
# Load model |
|
|
model = TACITModel() |
|
|
state_dict = load_file('tacit_epoch_100.safetensors') |
|
|
|
|
|
# Handle compiled model checkpoint |
|
|
if list(state_dict.keys())[0].startswith('_orig_mod.'): |
|
|
state_dict = {k.replace('_orig_mod.', ''): v for k, v in state_dict.items()} |
|
|
|
|
|
model.load_state_dict(state_dict) |
|
|
model.eval() |
|
|
|
|
|
# Inference |
|
|
# x0: input maze tensor (batch, 3, 64, 64), values in [0, 1] |
|
|
with torch.no_grad(): |
|
|
solution = sample_euler_method(model, x0, num_steps=10) |
|
|
``` |
|
|
|
|
|
## Inference Configuration |
|
|
|
|
|
| Parameter | Value | |
|
|
|-----------|-------| |
|
|
| Sampling Method | Euler | |
|
|
| Recommended Steps | 10 | |
|
|
| Output Range | [0, 1] | |
|
|
|
|
|
## Maze Format |
|
|
|
|
|
- **Resolution**: 64×64 pixels, RGB |
|
|
- **Color Scheme**: |
|
|
- White (255, 255, 255): Paths |
|
|
- Black (0, 0, 0): Walls |
|
|
- Green (0, 255, 0): Entry/Exit points |
|
|
- Red (255, 0, 0): Solution path (in solved mazes) |
|
|
|
|
|
## Research Applications |
|
|
|
|
|
This model is designed for interpretability research, particularly: |
|
|
|
|
|
1. **Mechanistic Interpretability**: Understanding how the model represents maze structure internally |
|
|
2. **World Models**: Studying emergent spatial reasoning without language |
|
|
3. **Diffusion Interpretability**: Analyzing intermediate denoising steps |
|
|
|
|
|
## Repository |
|
|
|
|
|
Full source code, training scripts, and additional checkpoints available at: |
|
|
[GitHub Repository](https://github.com/danielxmed/tacit) |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@software{tacit2024, |
|
|
title={TACIT: Transformation-Aware Capturing of Implicit Thought}, |
|
|
author={Daniel}, |
|
|
year={2024}, |
|
|
url={https://huggingface.co/tylerxdurden/tacit} |
|
|
} |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
Apache License 2.0 |
|
|
|