| ---
|
| license: cc-by-nc-4.0
|
| language:
|
| - en
|
| tags:
|
| - diffusion
|
| - anime
|
| - image-generation
|
| - dit
|
| - flow-matching
|
| pipeline_tag: text-to-image
|
| ---
|
|
|
| # Diffusion Transformer
|
|
|
| A flow matching-based diffusion transformer for anime image generation.
|
| This project is for **research purposes only**.
|
|
|
| ## Links
|
|
|
| - GitHub: https://github.com/FREEANIMA/diffusion_model_sampling
|
| - Hugging Face: https://huggingface.co/honghong3/diffusion-transformer
|
|
|
| ## License
|
|
|
| This project is licensed under [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/).
|
| For research and non-commercial use only.
|
|
|
| ## Training Environment
|
|
|
| - **GPU**: NVIDIA A100 40GB (Google Colab)
|
| - **Dataset**: ~4.8M anime images
|
| - **Processed**: ~1.8M images (epoch 0, ongoing)
|
| - **Throughput**: ~1.3 it/s
|
| - Samples below are intermediate checkpoints β quality will improve as training continues.
|
|
|
| ## Training & Samples
|
|
|
| | 12k images | 600k images | 1.2M images | 1.8M images |
|
| |---|---|---|---|
|
| |  |  |  |  |
|
|
|
| ```
|
| # sampler conditional
|
| prompt = "1girl, red hair, school uniform, happy, red eyes, open mouth, detailed face"
|
| steps = 100
|
| cfg_scale = 2.0
|
| seed = 1234
|
| ```
|
|
|
| ## Model Architecture
|
|
|
| - **Backbone**: Diffusion Transformer (DiT) with adaLN modulation
|
| - **Parameters**: ~550M
|
| - **Framework**: Flow Matching (velocity prediction)
|
|
|
| 
|
|
|
|
|
| ## Components
|
|
|
| | Component | Model |
|
| |---|---|
|
| | VAE | stabilityai/sd-vae-ft-mse |
|
| | Text Encoder | openai/clip-vit-large-patch14 |
|
| | Tokenizer | openai/clip-vit-large-patch14 |
|
|
|
| ## Sampler Details
|
|
|
| - **Resolution**: 512 Γ 512 (single bucket)
|
| - **Noise Schedule**: Log-SNR uniform sampling with resolution-dependent shift
|
| - **CFG**: Classifier-free guidance
|
| - Prompts are **tag-based** (comma-separated danbooru-style tags)
|
|
|
| ## Requirements
|
|
|
| ```bash
|
| pip install torch transformers diffusers accelerate torchvision tqdm
|
| ```
|
|
|
| ## Usage
|
|
|
| ```bash
|
| python main.py
|
| ```
|
|
|
| ```
|
| C:.
|
| β main.py
|
| β output.png
|
| β README.md
|
| β requirements.txt
|
| β
|
| ββapp
|
| β β clip.py
|
| β β config.json
|
| β β config.py
|
| β β model.py
|
| β β sampling.py
|
| β β sd_vae.py
|
| β ββ __init__.py
|
| β
|
| ββassets
|
| β 100k.png
|
| β 150k.png
|
| β 1k.png
|
| β 50k.png
|
| β
|
| ββweights
|
| image.pth
|
|
|
| ``` |