--- license: apache-2.0 tags: - diffusion - maze-solving - world-model - interpretability - vision-transformer - pytorch language: - en pipeline_tag: image-to-image --- # TACIT - Transformation-Aware Capturing of Implicit Thought TACIT is a diffusion-based transformer model that learns to solve mazes. The model demonstrates emergent reasoning capabilities without explicit language supervision, making it an ideal subject for interpretability research on world models. ![Model Evolution](images/evolution_grid.png) ## Model Description TACIT learns to transform images of unsolved mazes into solved mazes using a flow-matching diffusion process. The model develops internal representations of maze structure and pathfinding without being explicitly programmed to do so. ### Architecture | Component | Specification | |-----------|---------------| | Type | Diffusion Transformer (DiT) | | Hidden Dimension | 384 | | Transformer Blocks | 8 | | Attention Heads | 6 | | Patch Size | 8×8 | | Input/Output | 64×64 RGB images | | Parameters | ~20M | ## Training - **Dataset**: 1,000,000 maze problem-solution pairs - **Epochs**: 100 - **Final Loss**: 6.25e-06 (MSE) - **Final L2 Distance**: 0.0014 The model learns to predict the velocity field in a flow-matching formulation, transforming unsolved mazes into their solutions through iterative refinement. ## Usage ```python import torch from safetensors.torch import load_file # Model architecture (copy from tacit/models/dit.py or install tacit package) from tacit import TACITModel, sample_euler_method # Load model model = TACITModel() state_dict = load_file('tacit_epoch_100.safetensors') # Handle compiled model checkpoint if list(state_dict.keys())[0].startswith('_orig_mod.'): state_dict = {k.replace('_orig_mod.', ''): v for k, v in state_dict.items()} model.load_state_dict(state_dict) model.eval() # Inference # x0: input maze tensor (batch, 3, 64, 64), values in [0, 1] with torch.no_grad(): solution = sample_euler_method(model, x0, num_steps=10) ``` ## Inference Configuration | Parameter | Value | |-----------|-------| | Sampling Method | Euler | | Recommended Steps | 10 | | Output Range | [0, 1] | ## Maze Format - **Resolution**: 64×64 pixels, RGB - **Color Scheme**: - White (255, 255, 255): Paths - Black (0, 0, 0): Walls - Green (0, 255, 0): Entry/Exit points - Red (255, 0, 0): Solution path (in solved mazes) ## Research Applications This model is designed for interpretability research, particularly: 1. **Mechanistic Interpretability**: Understanding how the model represents maze structure internally 2. **World Models**: Studying emergent spatial reasoning without language 3. **Diffusion Interpretability**: Analyzing intermediate denoising steps ## Repository Full source code, training scripts, and additional checkpoints available at: [GitHub Repository](https://github.com/danielxmed/tacit) ## Citation ```bibtex @software{tacit2024, title={TACIT: Transformation-Aware Capturing of Implicit Thought}, author={Daniel}, year={2024}, url={https://huggingface.co/tylerxdurden/tacit} } ``` ## License Apache License 2.0