File size: 2,839 Bytes
00a9d5b
 
55d1590
0bc67de
8fac008
be9eb25
 
 
 
5284a4f
be9eb25
 
 
 
 
 
 
 
 
 
c4b1c35
a9f05e4
be9eb25
 
 
 
 
be79f33
be9eb25
 
 
5284a4f
be9eb25
 
b6c0ce5
be9eb25
b6c0ce5
be9eb25
93306cd
119e62a
 
d197edf
ccf7451
af6a793
 
 
 
 
 
 
be9eb25
 
2ded7f2
113045c
 
 
 
 
 
 
 
 
 
 
2ded7f2
be9eb25
 
 
f23c3b1
 
c6737d6
1329a64
be9eb25
bfea4f8
be9eb25
bfea4f8
3394354
be9eb25
 
 
 
 
 
 
 
 
 
 
 
 
8fac008
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
---
license: apache-2.0
datasets:
- huggan/anime-faces
pipeline_tag: unconditional-image-generation
---

# TinyDiT

TinyDiT is an **85 million parameter unconditional image generation model** trained on **21,000+ anime face images**.

## Model Details

* **Model Name:** TinyDiT
* **Architecture:** Diffusion Transformer (DiT-inspired)
* **Parameters:** 85M
* **Task:** Unconditional Image Generation
* **Dataset Size:** 21,000+ anime face images
* **VAE:** Lightweight 13M parameter VAE
* **Generation Type:** Anime face generation from random noise (no text conditioning)
* **Image Size:** 64x64px
* **Github Repo:** https://github.com/Nitesh1405/TinyDiT/tree/main

## Dataset

TinyDiT was trained on a curated anime face dataset containing over 21k images.

**Dataset Repository:** `huggan/anime-faces`

## VAE

The model uses a compact **13M parameter Variational Autoencoder (VAE)** for latent-space encoding and decoding. 


## Example Generated Images

Below is a sample images generated by TinyDiT:

<p align="center" style="display: flex;">
  <img src="images/sample.webp" width="64"/>
  <img src="images/sample2.webp" width="64"/>
  <img src="images/sample3.webp" width="64"/>
  <img src="images/sample4.webp" width="64"/>
  <img src="images/sample5.webp" width="64"/>
  <img src="images/sample6.webp" width="64"/>
  <img src="images/sample7.webp" width="64"/>
  <img src="images/sample8.webp" width="64"/>
  <img src="images/sample9.webp" width="64"/>
  <img src="images/sample10.webp" width="64"/>
  <img src="images/sample11.webp" width="64"/>
</p>

<p align="center" style="display: flex;">
  <img src="images/sample22.webp" width="64"/>
  <img src="images/sample12.webp" width="64"/>
  <img src="images/sample13.webp" width="64"/>
  <img src="images/sample14.webp" width="64"/>
  <img src="images/sample15.webp" width="64"/>
  <img src="images/sample16.webp" width="64"/>
  <img src="images/sample17.webp" width="64"/>
  <img src="images/sample18.webp" width="64"/>
  <img src="images/sample19.webp" width="64"/>
  <img src="images/sample20.webp" width="64"/>
  <img src="images/sample21.webp" width="64"/>
</p>

## Usage

* **HuggingFace Space:** https://huggingface.co/spaces/nitesh501/TinyDiT

```bash
git clone https://github.com/Nitesh1405/TinyDiT.git && cd TinyDiT

pip install -r requirements.txt

python app.py
#the model will automatically download on first run if you have wget, if not you can download the model from https://huggingface.co/nitesh501/tinydit and place it in TinyDit Folder.
```


## Limitations

* Trained only on anime face data
* Unconditional generation only
* Limited diversity compared to larger diffusion models
* May occasionally generate blurry or distorted outputs


## Acknowledgements

Inspired by DiT architectures, latent diffusion models, and the open-source generative AI community.