| license: mit | |
| library_name: diffusers | |
| pipeline_tag: image-to-image | |
| [](https://discord.gg/2JhHVh7CGu) | |
| A semi custom network trained from scratch for 799 epochs with tensor product attention. This repository contains the model | |
| described in [Tensor Product Attention Is All You Need](https://huggingface.co/papers/2501.06425). | |
| Github repository: https://github.com/tensorgi/T6 | |
| [Modeling](https://huggingface.co/Blackroot/SimpleDiffusion-TensorProductAttentionRope/blob/main/models/uvit.py) || [Training](https://huggingface.co/Blackroot/SimpleDiffusion-TensorProductAttentionRope/blob/main/train.py) | |
| This network uses the optimal transport flow matching objective outlined [Flow Matching for Generative Modeling](https://arxiv.org/abs/2210.02747) | |
| A modified tensor product attention with rope is used instead of regular MHA [Tensor Product Attention is All You Need](https://arxiv.org/abs/2501.06425) | |
| xATGLU Layers are used in some places [Expanded Gating Ranges Improve Activation Functions](https://arxiv.org/pdf/2405.20768) | |
| This network was optimized via [Distributed Shampoo Github](https://github.com/facebookresearch/optimizers/blob/main/distributed_shampoo/README.md) || [Distributed Shampoo Paper](https://arxiv.org/abs/2309.06497) | |
| ```python train.py``` will train a new image network on the provided dataset (Currently the dataset is being fully rammed into GPU and is defined in the preload_dataset function) | |
| ```python test_sample.py step_799.safetensors``` Where step_799.safetensors is the desired model to test inference on. This will always generate a sample grid of 16x16 images. | |
| | | | | |
| |:---:|:---:| | |
| |  |  | | |
| |  |  | | |
|  |