devbnamdar
/

Nova_ae_f8

image-generation

Model card Files Files and versions

Nova_ae_f8 / README.md

devbnamdar's picture

Upload README.md

222bd7f verified 4 days ago

|

history blame contribute delete

2.65 kB

	---
	license: cc-by-nc-4.0
	tags:
	- autoencoder
	- vae
	- diffusion
	- image-generation
	- pytorch
	datasets:
	- imagenet-1k
	- celeba
	language:
	- en
	---

	# 🎨 Custom 8-Channel VAE (f8)

	This is a custom-trained Variational Autoencoder (VAE) featuring an 8-channel latent space and an f8 downsampling factor. It was trained from scratch on a combination of ImageNet and CelebA datasets to achieve highly detailed image reconstruction and robust latent representations.

	While originally developed as the latent backbone for the [NovaFace-DiT](https://huggingface.co/devbnamdar/NovaFace-DiT) model, this VAE is entirely independent and can be used as a drop-in component for any custom Latent Diffusion Model (LDM) or Flow Matching architecture.

	<div align="center">
	<img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/vae_reconstruction.png" width="90%" alt="VAE Reconstruction on Unseen Data" />
	<br/>
	<em>Top row: Original Images (Unseen data). Bottom row: 8-Channel VAE Reconstructions.</em>
	</div>

	## 📊 Model Details

	- Model Type: Variational Autoencoder (VAE)
	- Latent Channels: 8
	- Downsample Factor: 8 (f8)
	- Parameters: ~100 Million
	- Training Datasets: ImageNet (1.3M) + CelebA
	- Max Supported Resolution: up to 1024x1024
	- License: Creative Commons BY-NC 4.0 (Non-commercial)

	## 🏗️ Architecture Configuration

	If you are initializing this model in PyTorch using the official codebase, the architecture parameters are as follows:

	```python
	model_architecture_config = {
	'in_channels': 3,
	'out_channels': 3,
	'base_channels': 128,
	'channel_multipliers': [1, 2, 4, 4],
	'num_residual_blocks_per_level': [2, 2, 2, 4],
	'z_channels': 8
	}
	```

	## 🚀 How to Use

	The weights provided here (`Nova_ae_f8.safetensors`) are intended to be loaded into the custom VAE architecture defined in our GitHub repository.

	🔗 Official GitHub Repository (Code & UI): [devbnamdar/MM-DiT-From-Scratch](https://github.com/devbnamdar/MM-DiT-From-Scratch)

	Using with NovaFace-DiT:
	1. Download the `.safetensors` file from this repository.
	2. Place it in the `vae_models/` directory of your cloned GitHub project.
	3. Update the `vae_path` in `config.py` (or select it in the Gradio UI).

	## 📄 Citation

	If you use this model in your research, please cite:

	```bibtex
	@misc{namdar2026mmdit,
	author = {Namdar, Bunyamin},
	title = {MM-DiT From Scratch: High-Fidelity Diffusion Training on Limited Dataset},
	year = {2026},
	publisher = {GitHub},
	url = {https://github.com/devbnamdar/MM-DiT-From-Scratch}
	}
	```