| --- |
| license: cc-by-nc-4.0 |
| tags: |
| - autoencoder |
| - vae |
| - diffusion |
| - image-generation |
| - pytorch |
| datasets: |
| - imagenet-1k |
| - celeba |
| language: |
| - en |
| --- |
| |
| # 🎨 Custom 8-Channel VAE (f8) |
|
|
| This is a custom-trained Variational Autoencoder (VAE) featuring an **8-channel latent space** and an **f8 downsampling factor**. It was trained from scratch on a combination of ImageNet and CelebA datasets to achieve highly detailed image reconstruction and robust latent representations. |
|
|
| While originally developed as the latent backbone for the [NovaFace-DiT](https://huggingface.co/devbnamdar/NovaFace-DiT) model, this VAE is entirely independent and can be used as a drop-in component for any custom Latent Diffusion Model (LDM) or Flow Matching architecture. |
|
|
| <div align="center"> |
| <img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/vae_reconstruction.png" width="90%" alt="VAE Reconstruction on Unseen Data" /> |
| <br/> |
| <em>Top row: Original Images (Unseen data). Bottom row: 8-Channel VAE Reconstructions.</em> |
| </div> |
|
|
| ## 📊 Model Details |
|
|
| - **Model Type:** Variational Autoencoder (VAE) |
| - **Latent Channels:** 8 |
| - **Downsample Factor:** 8 (f8) |
| - **Parameters:** ~100 Million |
| - **Training Datasets:** ImageNet (1.3M) + CelebA |
| - **Max Supported Resolution:** up to 1024x1024 |
| - **License:** Creative Commons BY-NC 4.0 (Non-commercial) |
|
|
| ## 🏗️ Architecture Configuration |
|
|
| If you are initializing this model in PyTorch using the official codebase, the architecture parameters are as follows: |
|
|
| ```python |
| model_architecture_config = { |
| 'in_channels': 3, |
| 'out_channels': 3, |
| 'base_channels': 128, |
| 'channel_multipliers': [1, 2, 4, 4], |
| 'num_residual_blocks_per_level': [2, 2, 2, 4], |
| 'z_channels': 8 |
| } |
| ``` |
|
|
| ## 🚀 How to Use |
|
|
| The weights provided here (`Nova_ae_f8.safetensors`) are intended to be loaded into the custom VAE architecture defined in our GitHub repository. |
|
|
| 🔗 **Official GitHub Repository (Code & UI):** [devbnamdar/MM-DiT-From-Scratch](https://github.com/devbnamdar/MM-DiT-From-Scratch) |
|
|
| **Using with NovaFace-DiT:** |
| 1. Download the `.safetensors` file from this repository. |
| 2. Place it in the `vae_models/` directory of your cloned GitHub project. |
| 3. Update the `vae_path` in `config.py` (or select it in the Gradio UI). |
|
|
| ## 📄 Citation |
|
|
| If you use this model in your research, please cite: |
|
|
| ```bibtex |
| @misc{namdar2026mmdit, |
| author = {Namdar, Bunyamin}, |
| title = {MM-DiT From Scratch: High-Fidelity Diffusion Training on Limited Dataset}, |
| year = {2026}, |
| publisher = {GitHub}, |
| url = {https://github.com/devbnamdar/MM-DiT-From-Scratch} |
| } |
| ``` |
|
|