Update README.md
Browse files# TinyDiT
TinyDiT is an **85 million parameter unconditional image generation model** trained on **21,000+ anime face images**. The model is designed to be lightweight, efficient, and fast while still producing visually appealing anime-style face generations.
The project explores compact diffusion transformer architectures capable of generating high-quality images with relatively low computational requirements.
## Model Details
* **Model Name:** TinyDiT
* **Architecture:** Diffusion Transformer (DiT-inspired)
* **Parameters:** 85M
* **Task:** Unconditional Image Generation
* **Dataset Size:** 21,000+ anime face images
* **VAE:** Lightweight 13M parameter VAE
* **Generation Type:** Anime face generation from random noise (no text conditioning)
## Dataset
TinyDiT was trained on a curated anime face dataset containing over 21k images.
**Dataset Repository:** `YOUR_DATASET_REPO_ID`
Replace the placeholder above with your actual Hugging Face dataset repository ID.
## VAE
The model uses a compact **13M parameter Variational Autoencoder (VAE)** for latent-space encoding and decoding. This significantly reduces training cost and improves inference efficiency.
## Features
* Compact 85M parameter architecture
* Fast and lightweight image generation
* Anime-style face synthesis
* Efficient latent diffusion training
* Suitable for low-resource GPUs and experimentation
## Example Generated Image
Below is a sample image generated by TinyDiT:
<p align="center">
<img src="generated_sample.png" width="256"/>
</p>
The model produces soft anime-style portraits with coherent facial structure and color consistency despite its relatively small size.
## Usage
```python
from diffusers import DiffusionPipeline
import torch
pipe = DiffusionPipeline.from_pretrained("YOUR_USERNAME/tinydit")
pipe.to("cuda")
image = pipe().images[0]
image.save("tinydit_sample.png")
```
## Training
TinyDiT was trained using latent diffusion techniques on anime face images with a lightweight transformer backbone.
### Training Highlights
* 21k+ anime face dataset
* Latent-space diffusion training
* Compact transformer architecture
* Memory-efficient VAE
* Optimized for smaller GPUs
## Limitations
* Trained only on anime face data
* Unconditional generation only
* Limited diversity compared to larger diffusion models
* Lower image sharpness at higher resolutions
* May occasionally generate blurry or distorted outputs
## Future Improvements
* Text-conditioned generation
* Larger and more diverse datasets
* Higher-resolution image synthesis
* Improved sampling methods
* Better facial detail consistency
## License
Please specify the appropriate license for this repository.
## Acknowledgements
Inspired by DiT architectures, latent diffusion models, and the open-source generative AI community.