File size: 2,650 Bytes
222bd7f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | ---
license: cc-by-nc-4.0
tags:
- autoencoder
- vae
- diffusion
- image-generation
- pytorch
datasets:
- imagenet-1k
- celeba
language:
- en
---
# 🎨 Custom 8-Channel VAE (f8)
This is a custom-trained Variational Autoencoder (VAE) featuring an **8-channel latent space** and an **f8 downsampling factor**. It was trained from scratch on a combination of ImageNet and CelebA datasets to achieve highly detailed image reconstruction and robust latent representations.
While originally developed as the latent backbone for the [NovaFace-DiT](https://huggingface.co/devbnamdar/NovaFace-DiT) model, this VAE is entirely independent and can be used as a drop-in component for any custom Latent Diffusion Model (LDM) or Flow Matching architecture.
<div align="center">
<img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/vae_reconstruction.png" width="90%" alt="VAE Reconstruction on Unseen Data" />
<br/>
<em>Top row: Original Images (Unseen data). Bottom row: 8-Channel VAE Reconstructions.</em>
</div>
## 📊 Model Details
- **Model Type:** Variational Autoencoder (VAE)
- **Latent Channels:** 8
- **Downsample Factor:** 8 (f8)
- **Parameters:** ~100 Million
- **Training Datasets:** ImageNet (1.3M) + CelebA
- **Max Supported Resolution:** up to 1024x1024
- **License:** Creative Commons BY-NC 4.0 (Non-commercial)
## 🏗️ Architecture Configuration
If you are initializing this model in PyTorch using the official codebase, the architecture parameters are as follows:
```python
model_architecture_config = {
'in_channels': 3,
'out_channels': 3,
'base_channels': 128,
'channel_multipliers': [1, 2, 4, 4],
'num_residual_blocks_per_level': [2, 2, 2, 4],
'z_channels': 8
}
```
## 🚀 How to Use
The weights provided here (`Nova_ae_f8.safetensors`) are intended to be loaded into the custom VAE architecture defined in our GitHub repository.
🔗 **Official GitHub Repository (Code & UI):** [devbnamdar/MM-DiT-From-Scratch](https://github.com/devbnamdar/MM-DiT-From-Scratch)
**Using with NovaFace-DiT:**
1. Download the `.safetensors` file from this repository.
2. Place it in the `vae_models/` directory of your cloned GitHub project.
3. Update the `vae_path` in `config.py` (or select it in the Gradio UI).
## 📄 Citation
If you use this model in your research, please cite:
```bibtex
@misc{namdar2026mmdit,
author = {Namdar, Bunyamin},
title = {MM-DiT From Scratch: High-Fidelity Diffusion Training on Limited Dataset},
year = {2026},
publisher = {GitHub},
url = {https://github.com/devbnamdar/MM-DiT-From-Scratch}
}
```
|