devbnamdar commited on
Commit
222bd7f
·
verified ·
1 Parent(s): 54928c2

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ tags:
4
+ - autoencoder
5
+ - vae
6
+ - diffusion
7
+ - image-generation
8
+ - pytorch
9
+ datasets:
10
+ - imagenet-1k
11
+ - celeba
12
+ language:
13
+ - en
14
+ ---
15
+
16
+ # 🎨 Custom 8-Channel VAE (f8)
17
+
18
+ This is a custom-trained Variational Autoencoder (VAE) featuring an **8-channel latent space** and an **f8 downsampling factor**. It was trained from scratch on a combination of ImageNet and CelebA datasets to achieve highly detailed image reconstruction and robust latent representations.
19
+
20
+ While originally developed as the latent backbone for the [NovaFace-DiT](https://huggingface.co/devbnamdar/NovaFace-DiT) model, this VAE is entirely independent and can be used as a drop-in component for any custom Latent Diffusion Model (LDM) or Flow Matching architecture.
21
+
22
+ <div align="center">
23
+ <img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/vae_reconstruction.png" width="90%" alt="VAE Reconstruction on Unseen Data" />
24
+ <br/>
25
+ <em>Top row: Original Images (Unseen data). Bottom row: 8-Channel VAE Reconstructions.</em>
26
+ </div>
27
+
28
+ ## 📊 Model Details
29
+
30
+ - **Model Type:** Variational Autoencoder (VAE)
31
+ - **Latent Channels:** 8
32
+ - **Downsample Factor:** 8 (f8)
33
+ - **Parameters:** ~100 Million
34
+ - **Training Datasets:** ImageNet (1.3M) + CelebA
35
+ - **Max Supported Resolution:** up to 1024x1024
36
+ - **License:** Creative Commons BY-NC 4.0 (Non-commercial)
37
+
38
+ ## 🏗️ Architecture Configuration
39
+
40
+ If you are initializing this model in PyTorch using the official codebase, the architecture parameters are as follows:
41
+
42
+ ```python
43
+ model_architecture_config = {
44
+ 'in_channels': 3,
45
+ 'out_channels': 3,
46
+ 'base_channels': 128,
47
+ 'channel_multipliers': [1, 2, 4, 4],
48
+ 'num_residual_blocks_per_level': [2, 2, 2, 4],
49
+ 'z_channels': 8
50
+ }
51
+ ```
52
+
53
+ ## 🚀 How to Use
54
+
55
+ The weights provided here (`Nova_ae_f8.safetensors`) are intended to be loaded into the custom VAE architecture defined in our GitHub repository.
56
+
57
+ 🔗 **Official GitHub Repository (Code & UI):** [devbnamdar/MM-DiT-From-Scratch](https://github.com/devbnamdar/MM-DiT-From-Scratch)
58
+
59
+ **Using with NovaFace-DiT:**
60
+ 1. Download the `.safetensors` file from this repository.
61
+ 2. Place it in the `vae_models/` directory of your cloned GitHub project.
62
+ 3. Update the `vae_path` in `config.py` (or select it in the Gradio UI).
63
+
64
+ ## 📄 Citation
65
+
66
+ If you use this model in your research, please cite:
67
+
68
+ ```bibtex
69
+ @misc{namdar2026mmdit,
70
+ author = {Namdar, Bunyamin},
71
+ title = {MM-DiT From Scratch: High-Fidelity Diffusion Training on Limited Dataset},
72
+ year = {2026},
73
+ publisher = {GitHub},
74
+ url = {https://github.com/devbnamdar/MM-DiT-From-Scratch}
75
+ }
76
+ ```