NovaFace-DiT / README.md
devbnamdar's picture
Update README.md
c9782d4 verified
---
license: cc-by-nc-sa-4.0
tags:
- text-to-image
- diffusion
- mm-dit
- stable-diffusion-3
- face-generation
- ffhq
- pytorch
datasets:
- ffhq
language:
- en
---
# 🌟 NovaFace-DiT (512x512)
**NovaFace-DiT** is a Multimodal Diffusion Transformer (MM-DiT) model trained entirely from scratch for high-fidelity human face synthesis. It leverages the powerful Rectified Flow Matching technique and is deeply inspired by the Stable Diffusion 3 architecture.
Despite being trained on a highly constrained hardware setup (a single consumer-grade GPU) and a highly curated dataset (70,000 images from FFHQ), NovaFace-DiT demonstrates the incredible efficiency and scaling capability of the custom MM-DiT architecture.
<table style="border: none; background-color: transparent;">
<tr>
<td style="border: none; background-color: transparent; padding: 2px;"><img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/sample4.png" alt="Generated Face 1" /></td>
<td style="border: none; background-color: transparent; padding: 2px;"><img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/sample5.png" alt="Generated Face 2" /></td>
<td style="border: none; background-color: transparent; padding: 2px;"><img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/sample6.png" alt="Generated Face 3" /></td>
<td style="border: none; background-color: transparent; padding: 2px;"><img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/sample7.png" alt="Generated Face 4" /></td>
</tr>
</table>
<br>
<div align="center">
<em>High-fidelity samples generated by NovaFace-DiT using complex text prompts.</em>
</div>
## 📊 Model Details
- **Model Type:** Text-to-Image Diffusion Transformer (MM-DiT)
- **Parameters:** ~260 Million
- **Text Encoder:** T5-Base (768-dim)
- **Latent Space:** Custom 8-channel VAE (f8)
- **Training Dataset:** [FFHQ (Flickr-Faces-HQ)](https://github.com/NVlabs/ffhq-dataset)
- **Resolution:** 512x512
- **License:** Creative Commons BY-NC-SA 4.0 (Non-commercial)
## ⚡ Requirements & Custom VAE
NovaFace-DiT operates in an optimized 8-channel latent space and **requires** our custom-trained Autoencoder (VAE) to decode images properly. Standard SDXL or SD3 VAEs are not compatible.
👉 **[Download the Custom 8-Channel VAE here](https://huggingface.co/devbnamdar/Custom-VAE-8ch-f8)** *(Note: Please download this VAE to generate images)*
## 🚀 How to Use (Code & UI)
This repository contains **only the model weights (`.safetensors`)**. To actually generate images, inspect the architecture, or resume training, please visit our official GitHub repository which contains a full production-ready Gradio UI and training pipeline.
🔗 **Official GitHub Repository:** [devbnamdar/MM-DiT-From-Scratch](https://github.com/devbnamdar/MM-DiT-From-Scratch)
**Quick Setup:**
1. Clone the GitHub repository.
2. Download the `NovaFace-DiT.safetensors` from this Hugging Face page and place it in your local `checkpoints/` directory.
3. Download the Custom VAE from [its separate repository](https://huggingface.co/devbnamdar/Custom-VAE-8ch-f8) and place it in your local `vae_models/` directory.
4. Launch the Gradio app:
```bash
python gradio_ui/app.py
```
5. In the Gradio UI, go to the **"⚙️ Settings"** tab, enter the path to your downloaded model (e.g., `checkpoints/NovaFace-DiT.safetensors`) in the **"Base Model Path"** field, and click **"Load Models to GPU"**.
## ⚠️ Limitations and Bias
- **Domain Specific:** This model was trained exclusively on the FFHQ dataset. It is highly specialized in generating human portraits (shoulders and above). It is not designed to generate landscapes, animals, or full-body shots.
- **Text Rendering:** The model does not generate legible text or complex typography.
- **Bias:** As the model is trained on FFHQ, it may inherit demographic or lighting biases present in the original dataset.
## 📄 Citation
If you use this model or the accompanying codebase in your research or projects, please cite:
```bibtex
@misc{namdar2026mmdit,
author = {Namdar, Bunyamin},
title = {MM-DiT From Scratch: High-Fidelity Diffusion Training on Limited Dataset},
year = {2026},
publisher = {GitHub},
url = {https://github.com/devbnamdar/MM-DiT-From-Scratch}
}
```