--- license: cc-by-nc-4.0 language: - en tags: - diffusion - anime - image-generation - dit - flow-matching pipeline_tag: text-to-image --- # Diffusion Transformer A flow matching-based diffusion transformer for anime image generation. This project is for **research purposes only**. ## Links - GitHub: https://github.com/FREEANIMA/diffusion_model_sampling - Hugging Face: https://huggingface.co/honghong3/diffusion-transformer ## License This project is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). For research and non-commercial use only. ## Training Environment - **GPU**: NVIDIA A100 40GB (Google Colab) - **Dataset**: ~4.8M anime images - **Processed**: ~1.8M images (epoch 0, ongoing) - **Throughput**: ~1.3 it/s - Samples below are intermediate checkpoints — quality will improve as training continues. ## Training & Samples | 12k images | 600k images | 1.2M images | 1.8M images | |---|---|---|---| | ![1k](assets/1k.png) | ![50k](assets/50k.png) | ![100k](assets/100k.png) | ![150k](assets/150k.png) | ``` # sampler conditional prompt = "1girl, red hair, school uniform, happy, red eyes, open mouth, detailed face" steps = 100 cfg_scale = 2.0 seed = 1234 ``` ## Model Architecture - **Backbone**: Diffusion Transformer (DiT) with adaLN modulation - **Parameters**: ~550M - **Framework**: Flow Matching (velocity prediction) ![architecture](assets/model.png) ## Components | Component | Model | |---|---| | VAE | stabilityai/sd-vae-ft-mse | | Text Encoder | openai/clip-vit-large-patch14 | | Tokenizer | openai/clip-vit-large-patch14 | ## Sampler Details - **Resolution**: 512 × 512 (single bucket) - **Noise Schedule**: Log-SNR uniform sampling with resolution-dependent shift - **CFG**: Classifier-free guidance - Prompts are **tag-based** (comma-separated danbooru-style tags) ## Requirements ```bash pip install torch transformers diffusers accelerate torchvision tqdm ``` ## Usage ```bash python main.py ``` ``` C:. │ main.py │ output.png │ README.md │ requirements.txt │ ├─app │ │ clip.py │ │ config.json │ │ config.py │ │ model.py │ │ sampling.py │ │ sd_vae.py │ └─ __init__.py │ ├─assets │ 100k.png │ 150k.png │ 1k.png │ 50k.png │ └─weights image.pth ```