tylerxdurden
/

tacit

interpretability

vision-transformer

Model card Files Files and versions

tacit / README.md

tylerxdurden's picture

Update README.md

82d86af verified 12 days ago

|

history blame contribute delete

3.16 kB

	---
	license: apache-2.0
	tags:
	- diffusion
	- maze-solving
	- world-model
	- interpretability
	- vision-transformer
	- pytorch
	language:
	- en
	pipeline_tag: image-to-image
	---

	# TACIT - Transformation-Aware Capturing of Implicit Thought

	TACIT is a diffusion-based transformer model that learns to solve mazes. The model demonstrates emergent reasoning capabilities without explicit language supervision, making it an ideal subject for interpretability research on world models.

	![Model Evolution](images/evolution_grid.png)

	## Model Description

	TACIT learns to transform images of unsolved mazes into solved mazes using a flow-matching diffusion process. The model develops internal representations of maze structure and pathfinding without being explicitly programmed to do so.

	### Architecture

	\| Component \| Specification \|
	\|-----------\|---------------\|
	\| Type \| Diffusion Transformer (DiT) \|
	\| Hidden Dimension \| 384 \|
	\| Transformer Blocks \| 8 \|
	\| Attention Heads \| 6 \|
	\| Patch Size \| 8×8 \|
	\| Input/Output \| 64×64 RGB images \|
	\| Parameters \| ~20M \|

	## Training

	- Dataset: 1,000,000 maze problem-solution pairs
	- Epochs: 100
	- Final Loss: 6.25e-06 (MSE)
	- Final L2 Distance: 0.0014

	The model learns to predict the velocity field in a flow-matching formulation, transforming unsolved mazes into their solutions through iterative refinement.

	## Usage

	```python
	import torch
	from safetensors.torch import load_file

	# Model architecture (copy from tacit/models/dit.py or install tacit package)
	from tacit import TACITModel, sample_euler_method

	# Load model
	model = TACITModel()
	state_dict = load_file('tacit_epoch_100.safetensors')

	# Handle compiled model checkpoint
	if list(state_dict.keys())[0].startswith('_orig_mod.'):
	state_dict = {k.replace('_orig_mod.', ''): v for k, v in state_dict.items()}

	model.load_state_dict(state_dict)
	model.eval()

	# Inference
	# x0: input maze tensor (batch, 3, 64, 64), values in [0, 1]
	with torch.no_grad():
	solution = sample_euler_method(model, x0, num_steps=10)
	```

	## Inference Configuration

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Sampling Method \| Euler \|
	\| Recommended Steps \| 10 \|
	\| Output Range \| [0, 1] \|

	## Maze Format

	- Resolution: 64×64 pixels, RGB
	- Color Scheme:
	- White (255, 255, 255): Paths
	- Black (0, 0, 0): Walls
	- Green (0, 255, 0): Entry/Exit points
	- Red (255, 0, 0): Solution path (in solved mazes)

	## Research Applications

	This model is designed for interpretability research, particularly:

	1. Mechanistic Interpretability: Understanding how the model represents maze structure internally
	2. World Models: Studying emergent spatial reasoning without language
	3. Diffusion Interpretability: Analyzing intermediate denoising steps

	## Repository

	Full source code, training scripts, and additional checkpoints available at:
	[GitHub Repository](https://github.com/danielxmed/tacit)

	## Citation

	```bibtex
	@software{tacit2024,
	title={TACIT: Transformation-Aware Capturing of Implicit Thought},
	author={Daniel},
	year={2024},
	url={https://huggingface.co/tylerxdurden/tacit}
	}
	```

	## License

	Apache License 2.0