Update README.md

c9782d4 verified 9 days ago

4.42 kB

	---
	license: cc-by-nc-sa-4.0
	tags:
	- text-to-image
	- diffusion
	- mm-dit
	- stable-diffusion-3
	- face-generation
	- ffhq
	- pytorch
	datasets:
	- ffhq
	language:
	- en
	---

	# 🌟 NovaFace-DiT (512x512)

	NovaFace-DiT is a Multimodal Diffusion Transformer (MM-DiT) model trained entirely from scratch for high-fidelity human face synthesis. It leverages the powerful Rectified Flow Matching technique and is deeply inspired by the Stable Diffusion 3 architecture.

	Despite being trained on a highly constrained hardware setup (a single consumer-grade GPU) and a highly curated dataset (70,000 images from FFHQ), NovaFace-DiT demonstrates the incredible efficiency and scaling capability of the custom MM-DiT architecture.

	<table style="border: none; background-color: transparent;">
	<tr>
	<td style="border: none; background-color: transparent; padding: 2px;"><img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/sample4.png" alt="Generated Face 1" /></td>
	<td style="border: none; background-color: transparent; padding: 2px;"><img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/sample5.png" alt="Generated Face 2" /></td>
	<td style="border: none; background-color: transparent; padding: 2px;"><img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/sample6.png" alt="Generated Face 3" /></td>
	<td style="border: none; background-color: transparent; padding: 2px;"><img src="https://raw.githubusercontent.com/devbnamdar/MM-DiT-From-Scratch/main/assets/sample7.png" alt="Generated Face 4" /></td>
	</tr>
	</table>
	<br>
	<div align="center">
	<em>High-fidelity samples generated by NovaFace-DiT using complex text prompts.</em>
	</div>

	## 📊 Model Details

	- Model Type: Text-to-Image Diffusion Transformer (MM-DiT)
	- Parameters: ~260 Million
	- Text Encoder: T5-Base (768-dim)
	- Latent Space: Custom 8-channel VAE (f8)
	- Training Dataset: [FFHQ (Flickr-Faces-HQ)](https://github.com/NVlabs/ffhq-dataset)
	- Resolution: 512x512
	- License: Creative Commons BY-NC-SA 4.0 (Non-commercial)

	## ⚡ Requirements & Custom VAE

	NovaFace-DiT operates in an optimized 8-channel latent space and requires our custom-trained Autoencoder (VAE) to decode images properly. Standard SDXL or SD3 VAEs are not compatible.

	👉 [Download the Custom 8-Channel VAE here](https://huggingface.co/devbnamdar/Custom-VAE-8ch-f8) (Note: Please download this VAE to generate images)

	## 🚀 How to Use (Code & UI)

	This repository contains only the model weights (`.safetensors`). To actually generate images, inspect the architecture, or resume training, please visit our official GitHub repository which contains a full production-ready Gradio UI and training pipeline.

	🔗 Official GitHub Repository: [devbnamdar/MM-DiT-From-Scratch](https://github.com/devbnamdar/MM-DiT-From-Scratch)

	Quick Setup:
	1. Clone the GitHub repository.
	2. Download the `NovaFace-DiT.safetensors` from this Hugging Face page and place it in your local `checkpoints/` directory.
	3. Download the Custom VAE from [its separate repository](https://huggingface.co/devbnamdar/Custom-VAE-8ch-f8) and place it in your local `vae_models/` directory.
	4. Launch the Gradio app:
	```bash
	python gradio_ui/app.py
	```
	5. In the Gradio UI, go to the "⚙️ Settings" tab, enter the path to your downloaded model (e.g., `checkpoints/NovaFace-DiT.safetensors`) in the "Base Model Path" field, and click "Load Models to GPU".

	## ⚠️ Limitations and Bias

	- Domain Specific: This model was trained exclusively on the FFHQ dataset. It is highly specialized in generating human portraits (shoulders and above). It is not designed to generate landscapes, animals, or full-body shots.
	- Text Rendering: The model does not generate legible text or complex typography.
	- Bias: As the model is trained on FFHQ, it may inherit demographic or lighting biases present in the original dataset.

	## 📄 Citation

	If you use this model or the accompanying codebase in your research or projects, please cite:

	```bibtex
	@misc{namdar2026mmdit,
	author = {Namdar, Bunyamin},
	title = {MM-DiT From Scratch: High-Fidelity Diffusion Training on Limited Dataset},
	year = {2026},
	publisher = {GitHub},
	url = {https://github.com/devbnamdar/MM-DiT-From-Scratch}
	}
	```