devbnamdar
/

Nova_ae_f8

image-generation

Model card Files Files and versions

devbnamdar commited on May 18

Commit

222bd7f

·

verified ·

1 Parent(s): 54928c2

Upload README.md

Files changed (1) hide show

README.md +76 -0

README.md ADDED Viewed

	@@ -0,0 +1,76 @@

+---
+license: cc-by-nc-4.0
+tags:
+- autoencoder
+- vae
+- diffusion
+- image-generation
+- pytorch
+datasets:
+- imagenet-1k
+- celeba
+language:
+- en
+---
+# 🎨 Custom 8-Channel VAE (f8)
+This is a custom-trained Variational Autoencoder (VAE) featuring an **8-channel latent space** and an **f8 downsampling factor**. It was trained from scratch on a combination of ImageNet and CelebA datasets to achieve highly detailed image reconstruction and robust latent representations.
+While originally developed as the latent backbone for the [NovaFace-DiT](https://huggingface.co/devbnamdar/NovaFace-DiT) model, this VAE is entirely independent and can be used as a drop-in component for any custom Latent Diffusion Model (LDM) or Flow Matching architecture.
+<div align="center">
+  <img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/vae_reconstruction.png" width="90%" alt="VAE Reconstruction on Unseen Data" />
+  <br/>
+  <em>Top row: Original Images (Unseen data). Bottom row: 8-Channel VAE Reconstructions.</em>
+</div>
+## 📊 Model Details
+- **Model Type:** Variational Autoencoder (VAE)
+- **Latent Channels:** 8
+- **Downsample Factor:** 8 (f8)
+- **Parameters:** ~100 Million
+- **Training Datasets:** ImageNet (1.3M) + CelebA
+- **Max Supported Resolution:** up to 1024x1024
+- **License:** Creative Commons BY-NC 4.0 (Non-commercial)
+## 🏗️ Architecture Configuration
+If you are initializing this model in PyTorch using the official codebase, the architecture parameters are as follows:
+```python
+model_architecture_config = {
+    'in_channels': 3,
+    'out_channels': 3,
+    'base_channels': 128,
+    'channel_multipliers': [1, 2, 4, 4],
+    'num_residual_blocks_per_level': [2, 2, 2, 4],
+    'z_channels': 8
+}
+```
+## 🚀 How to Use
+The weights provided here (`Nova_ae_f8.safetensors`) are intended to be loaded into the custom VAE architecture defined in our GitHub repository.
+🔗 **Official GitHub Repository (Code & UI):** [devbnamdar/MM-DiT-From-Scratch](https://github.com/devbnamdar/MM-DiT-From-Scratch)
+**Using with NovaFace-DiT:**
+1. Download the `.safetensors` file from this repository.
+2. Place it in the `vae_models/` directory of your cloned GitHub project.
+3. Update the `vae_path` in `config.py` (or select it in the Gradio UI).
+## 📄 Citation
+If you use this model in your research, please cite:
+```bibtex
+@misc{namdar2026mmdit,
+  author       = {Namdar, Bunyamin},
+  title        = {MM-DiT From Scratch: High-Fidelity Diffusion Training on Limited Dataset},
+  year         = {2026},
+  publisher    = {GitHub},
+  url          = {https://github.com/devbnamdar/MM-DiT-From-Scratch}
+}
+```