File size: 2,650 Bytes
222bd7f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
---
license: cc-by-nc-4.0
tags:
- autoencoder
- vae
- diffusion
- image-generation
- pytorch
datasets:
- imagenet-1k
- celeba
language:
- en
---

# 🎨 Custom 8-Channel VAE (f8)

This is a custom-trained Variational Autoencoder (VAE) featuring an **8-channel latent space** and an **f8 downsampling factor**. It was trained from scratch on a combination of ImageNet and CelebA datasets to achieve highly detailed image reconstruction and robust latent representations.

While originally developed as the latent backbone for the [NovaFace-DiT](https://huggingface.co/devbnamdar/NovaFace-DiT) model, this VAE is entirely independent and can be used as a drop-in component for any custom Latent Diffusion Model (LDM) or Flow Matching architecture.

<div align="center">
  <img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/vae_reconstruction.png" width="90%" alt="VAE Reconstruction on Unseen Data" />
  <br/>
  <em>Top row: Original Images (Unseen data). Bottom row: 8-Channel VAE Reconstructions.</em>
</div>

## 📊 Model Details

- **Model Type:** Variational Autoencoder (VAE)
- **Latent Channels:** 8
- **Downsample Factor:** 8 (f8)
- **Parameters:** ~100 Million
- **Training Datasets:** ImageNet (1.3M) + CelebA
- **Max Supported Resolution:** up to 1024x1024
- **License:** Creative Commons BY-NC 4.0 (Non-commercial)

## 🏗️ Architecture Configuration

If you are initializing this model in PyTorch using the official codebase, the architecture parameters are as follows:

```python
model_architecture_config = {
    'in_channels': 3,
    'out_channels': 3,
    'base_channels': 128,
    'channel_multipliers': [1, 2, 4, 4],
    'num_residual_blocks_per_level': [2, 2, 2, 4],
    'z_channels': 8
}
```

## 🚀 How to Use

The weights provided here (`Nova_ae_f8.safetensors`) are intended to be loaded into the custom VAE architecture defined in our GitHub repository.

🔗 **Official GitHub Repository (Code & UI):** [devbnamdar/MM-DiT-From-Scratch](https://github.com/devbnamdar/MM-DiT-From-Scratch)

**Using with NovaFace-DiT:**
1. Download the `.safetensors` file from this repository.
2. Place it in the `vae_models/` directory of your cloned GitHub project.
3. Update the `vae_path` in `config.py` (or select it in the Gradio UI).

## 📄 Citation

If you use this model in your research, please cite:

```bibtex
@misc{namdar2026mmdit,
  author       = {Namdar, Bunyamin},
  title        = {MM-DiT From Scratch: High-Fidelity Diffusion Training on Limited Dataset},
  year         = {2026},
  publisher    = {GitHub},
  url          = {https://github.com/devbnamdar/MM-DiT-From-Scratch}
}
```