File size: 3,164 Bytes

---
license: apache-2.0
tags:
  - diffusion
  - maze-solving
  - world-model
  - interpretability
  - vision-transformer
  - pytorch
language:
  - en
pipeline_tag: image-to-image
---

# TACIT - Transformation-Aware Capturing of Implicit Thought

TACIT is a diffusion-based transformer model that learns to solve mazes. The model demonstrates emergent reasoning capabilities without explicit language supervision, making it an ideal subject for interpretability research on world models.

![Model Evolution](images/evolution_grid.png)

## Model Description

TACIT learns to transform images of unsolved mazes into solved mazes using a flow-matching diffusion process. The model develops internal representations of maze structure and pathfinding without being explicitly programmed to do so.

### Architecture

| Component | Specification |
|-----------|---------------|
| Type | Diffusion Transformer (DiT) |
| Hidden Dimension | 384 |
| Transformer Blocks | 8 |
| Attention Heads | 6 |
| Patch Size | 8×8 |
| Input/Output | 64×64 RGB images |
| Parameters | ~20M |

## Training

- **Dataset**: 1,000,000 maze problem-solution pairs
- **Epochs**: 100
- **Final Loss**: 6.25e-06 (MSE)
- **Final L2 Distance**: 0.0014

The model learns to predict the velocity field in a flow-matching formulation, transforming unsolved mazes into their solutions through iterative refinement.

## Usage

```python
import torch
from safetensors.torch import load_file

# Model architecture (copy from tacit/models/dit.py or install tacit package)
from tacit import TACITModel, sample_euler_method

# Load model
model = TACITModel()
state_dict = load_file('tacit_epoch_100.safetensors')

# Handle compiled model checkpoint
if list(state_dict.keys())[0].startswith('_orig_mod.'):
    state_dict = {k.replace('_orig_mod.', ''): v for k, v in state_dict.items()}

model.load_state_dict(state_dict)
model.eval()

# Inference
# x0: input maze tensor (batch, 3, 64, 64), values in [0, 1]
with torch.no_grad():
    solution = sample_euler_method(model, x0, num_steps=10)
```

## Inference Configuration

| Parameter | Value |
|-----------|-------|
| Sampling Method | Euler |
| Recommended Steps | 10 |
| Output Range | [0, 1] |

## Maze Format

- **Resolution**: 64×64 pixels, RGB
- **Color Scheme**:
  - White (255, 255, 255): Paths
  - Black (0, 0, 0): Walls
  - Green (0, 255, 0): Entry/Exit points
  - Red (255, 0, 0): Solution path (in solved mazes)

## Research Applications

This model is designed for interpretability research, particularly:

1. **Mechanistic Interpretability**: Understanding how the model represents maze structure internally
2. **World Models**: Studying emergent spatial reasoning without language
3. **Diffusion Interpretability**: Analyzing intermediate denoising steps

## Repository

Full source code, training scripts, and additional checkpoints available at:
[GitHub Repository](https://github.com/danielxmed/tacit)

## Citation

```bibtex
@software{tacit2024,
  title={TACIT: Transformation-Aware Capturing of Implicit Thought},
  author={Daniel},
  year={2024},
  url={https://huggingface.co/tylerxdurden/tacit}
}
```

## License

Apache License 2.0