Update README.md

446b1fb verified 21 days ago

6.47 kB

	---
	license: creativeml-openrail-m
	base_model:
	- SG161222/Realistic_Vision_V4.0_noVAE
	tags:
	- text-to-image
	- stable-diffusion
	- ip-adapter
	- face-id
	- identity-preservation
	- portrait
	- rishabh-in-code
	library_name: diffusers
	pipeline_tag: text-to-image
	---

	# TrueFace-Adapter: High-Fidelity Identity Preservation
	![Model License](https://img.shields.io/badge/License-Non--Commercial-red.svg)
	![Base Model](https://img.shields.io/badge/Base%20Model-Realistic%20Vision%20V4.0-blue.svg)

	---
	## Introduction
	This is a custom, fine-tuned version of the IP-Adapter-FaceID-PlusV2 model for Stable Diffusion 1.5. It was specifically trained to prioritize high-fidelity identity preservation while maintaining compositional realism across highly diverse prompts.

	The model relies on FaceID embeddings extracted via the InsightFace `buffalo_l` model to condition the image generation process directly into the UNet cross-attention layers.

	* Base Diffusion Model: `SG161222/Realistic_Vision_V4.0_noVAE`
	* VAE: `stabilityai/sd-vae-ft-mse`
	* Image Encoder: `laion/CLIP-ViT-H-14-laion2B-s32B-b79K`
	* Dataset: images sampled from `bitmind/celeb-a-hq`.
	* Optimization: Joint optimization utilizing standard Diffusion Loss paired with Identity Loss (ArcFace Cosine Similarity).

	## Evaluation Metrics
	The model was rigorously evaluated against the generic zero-shot IP-Adapter baseline. Testing involved generating multiple stylistic variations (cinematic lighting, charcoal sketch, outdoor lighting, etc.) across various seed images.

	\| Metric \| Baseline (Zero-Shot) \| Fine-Tuned (This Model) \| Note \|
	\|---\|---\|---\|---\|
	\| Identity Score (Higher is better) \| 0.8327 \| 0.8754 \| Significant improvement in facial structure retention. \|
	\| FID Score (Lower is better) \| 259.27 \| 283.11 \| Standard distributional gap trade-off when forcing strict identity constraints. \|

	Note: In 1-to-1 sample comparisons, this fine-tuned model successfully pushed specific Identity Scores as high as 0.9680, achieving superior sample-specific realism (FID: 421.97 vs Baseline: 448.15).


	## Generalization to Unseen Data (CelebA-HQ)

	To prove TrueFace-Adapter does not overfit to the training data, we tested it on unseen subjects from the CelebA-HQ dataset across 5 distinct prompts (Cinematic, Smiling, Sunglasses, Studio, Charcoal Sketch).

	Reference Subject (Unseen Data):
	![Original](celeb_original.png)

	Baseline (Standard IP-Adapter Zero-Shot):
	Notice the loss of the square jawline, the alteration of the eye shape, and the complete loss of identity in the sketch (far right).
	![Baseline](celeb_baseline.png)

	TrueFace-Adapter (Ours):
	The fine-tuned model strictly preserves the subject's deep-set eyes, specific jaw structure, and maintains high-fidelity likeness even in the charcoal sketch medium.
	![Finetuned](celeb_finetuned.png)


	## Usage

	To use this model, you first need to extract the face embedding and aligned face image using `insightface`.

	```python
	import cv2
	import torch
	from insightface.app import FaceAnalysis
	from insightface.utils import face_align
	from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
	from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus

	# 1. Setup Face Extraction
	app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider'])
	app.prepare(ctx_id=0, det_size=(640, 640))

	image = cv2.imread("your_seed_image.jpg")
	faces = app.get(image)
	faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
	face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224)

	# 2. Setup Pipeline
	device = "cuda"
	base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
	vae_model_path = "stabilityai/sd-vae-ft-mse"
	image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
	ip_ckpt = "ip-adapter-faceid-plusv2_sd15-finetuned_RishabhInCode.bin" # This repo's file

	noise_scheduler = DDIMScheduler(
	num_train_timesteps=1000,
	beta_start=0.00085,
	beta_end=0.012,
	beta_schedule="scaled_linear",
	clip_sample=False,
	set_alpha_to_one=False,
	steps_offset=1,
	)
	vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
	pipe = StableDiffusionPipeline.from_pretrained(
	base_model_path,
	torch_dtype=torch.float16,
	scheduler=noise_scheduler,
	vae=vae,
	safety_checker=None
	).to(device)

	# 3. Load IP-Adapter with Custom Fine-Tuned Weights
	ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device)

	# 4. Generate
	prompt = "a cinematic portrait of the person in cyberpunk lighting"
	images = ip_model.generate(
	prompt=prompt,
	face_image=face_image,
	faceid_embeds=faceid_embeds,
	shortcut=True,
	s_scale=1.0,
	num_samples=1,
	width=512,
	height=768,
	num_inference_steps=30
	)
	images[0].save("output.png")
	```

	## Technical Lineage & Credits

	This project is a specialized refinement of several foundational works in the Generative AI ecosystem.

	### Base Architecture
	* Diffusion Model: [Realistic Vision V4.0](https://huggingface.co/SG161222/Realistic_Vision_V4.0_noVAE) by SG161222.
	* Adapter Framework: [IP-Adapter-FaceID-PlusV2](https://huggingface.co/h94/IP-Adapter-FaceID-PlusV2) by Tencent AI Lab.

	### Component Acknowledgments
	* Face Embedding: Developed using [InsightFace](https://github.com/deepinsight/insightface) (buffalo_l), utilizing the ArcFace identity loss function.
	* Image Encoding: [CLIP-ViT-H-14-laion2B](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) for structural consistency.
	* Fine-Tuning Data: Curated samples from the [CelebA-HQ Dataset](https://github.com/tkarras/progressive_growing_of_gans).

	## License & Ethical Use
	TrueFace-Adapter is released under a Non-Commercial Research License.
	1. This model inherits the restrictive license of InsightFace.
	2. Ethical Guidelines: This model is intended for artistic expression and identity-consistent portrait generation. Users are prohibited from using this tool to generate non-consensual deepfakes or misleading media.


	## Citation

	If you use this fine-tuned model in your research or projects, please cite it as:
	```bibtex
	@misc{rishabhincode2026trueface,
	author = {RishabhInCode},
	title = {TrueFace-Adapter: High-Fidelity Identity Preservation},
	year = {2026},
	publisher = {Hugging Face},
	howpublished = {\url{https://huggingface.co/RishabhInCode/TrueFace-Adapter}}
	}
	```