File size: 6,465 Bytes

---
license: creativeml-openrail-m
base_model:
- SG161222/Realistic_Vision_V4.0_noVAE
tags:
- text-to-image
- stable-diffusion
- ip-adapter
- face-id
- identity-preservation
- portrait
- rishabh-in-code
library_name: diffusers
pipeline_tag: text-to-image
---

# TrueFace-Adapter: High-Fidelity Identity Preservation
![Model License](https://img.shields.io/badge/License-Non--Commercial-red.svg)
![Base Model](https://img.shields.io/badge/Base%20Model-Realistic%20Vision%20V4.0-blue.svg)

---
## Introduction
This is a custom, fine-tuned version of the **IP-Adapter-FaceID-PlusV2** model for Stable Diffusion 1.5. It was specifically trained to prioritize high-fidelity identity preservation while maintaining compositional realism across highly diverse prompts. 

The model relies on FaceID embeddings extracted via the InsightFace `buffalo_l` model to condition the image generation process directly into the UNet cross-attention layers.

* **Base Diffusion Model:** `SG161222/Realistic_Vision_V4.0_noVAE`
* **VAE:** `stabilityai/sd-vae-ft-mse`
* **Image Encoder:** `laion/CLIP-ViT-H-14-laion2B-s32B-b79K`
* **Dataset:** images sampled from `bitmind/celeb-a-hq`.
* **Optimization:** Joint optimization utilizing standard Diffusion Loss paired with Identity Loss (ArcFace Cosine Similarity).

## Evaluation Metrics
The model was rigorously evaluated against the generic zero-shot IP-Adapter baseline. Testing involved generating multiple stylistic variations (cinematic lighting, charcoal sketch, outdoor lighting, etc.) across various seed images.

| Metric | Baseline (Zero-Shot) | Fine-Tuned (This Model) | Note |
|---|---|---|---|
| **Identity Score** (Higher is better) | 0.8327 | **0.8754** | Significant improvement in facial structure retention. |
| **FID Score** (Lower is better) | **259.27** | 283.11 | Standard distributional gap trade-off when forcing strict identity constraints. |

*Note: In 1-to-1 sample comparisons, this fine-tuned model successfully pushed specific Identity Scores as high as **0.9680**, achieving superior sample-specific realism (FID: 421.97 vs Baseline: 448.15).*


## Generalization to Unseen Data (CelebA-HQ)

To prove TrueFace-Adapter does not overfit to the training data, we tested it on unseen subjects from the CelebA-HQ dataset across 5 distinct prompts (Cinematic, Smiling, Sunglasses, Studio, Charcoal Sketch). 

**Reference Subject (Unseen Data):**
![Original](celeb_original.png)

**Baseline (Standard IP-Adapter Zero-Shot):**
*Notice the loss of the square jawline, the alteration of the eye shape, and the complete loss of identity in the sketch (far right).*
![Baseline](celeb_baseline.png)

**TrueFace-Adapter (Ours):**
*The fine-tuned model strictly preserves the subject's deep-set eyes, specific jaw structure, and maintains high-fidelity likeness even in the charcoal sketch medium.*
![Finetuned](celeb_finetuned.png)


## Usage

To use this model, you first need to extract the face embedding and aligned face image using `insightface`. 

```python
import cv2
import torch
from insightface.app import FaceAnalysis
from insightface.utils import face_align
from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus

# 1. Setup Face Extraction
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

image = cv2.imread("your_seed_image.jpg")
faces = app.get(image)
faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224)

# 2. Setup Pipeline
device = "cuda"
base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
vae_model_path = "stabilityai/sd-vae-ft-mse"
image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
ip_ckpt = "ip-adapter-faceid-plusv2_sd15-finetuned_RishabhInCode.bin" # This repo's file

noise_scheduler = DDIMScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    clip_sample=False,
    set_alpha_to_one=False,
    steps_offset=1,
)
vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
pipe = StableDiffusionPipeline.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16,
    scheduler=noise_scheduler,
    vae=vae,
    safety_checker=None
).to(device)

# 3. Load IP-Adapter with Custom Fine-Tuned Weights
ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device)

# 4. Generate
prompt = "a cinematic portrait of the person in cyberpunk lighting"
images = ip_model.generate(
     prompt=prompt, 
     face_image=face_image, 
     faceid_embeds=faceid_embeds, 
     shortcut=True, 
     s_scale=1.0,
     num_samples=1, 
     width=512, 
     height=768, 
     num_inference_steps=30
)
images[0].save("output.png")
```

## Technical Lineage & Credits

This project is a specialized refinement of several foundational works in the Generative AI ecosystem.

### Base Architecture
* **Diffusion Model:** [Realistic Vision V4.0](https://huggingface.co/SG161222/Realistic_Vision_V4.0_noVAE) by SG161222.
* **Adapter Framework:** [IP-Adapter-FaceID-PlusV2](https://huggingface.co/h94/IP-Adapter-FaceID-PlusV2) by Tencent AI Lab.

### Component Acknowledgments
* **Face Embedding:** Developed using [InsightFace](https://github.com/deepinsight/insightface) (buffalo_l), utilizing the ArcFace identity loss function.
* **Image Encoding:** [CLIP-ViT-H-14-laion2B](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) for structural consistency.
* **Fine-Tuning Data:** Curated samples from the [CelebA-HQ Dataset](https://github.com/tkarras/progressive_growing_of_gans).

## License & Ethical Use
**TrueFace-Adapter** is released under a **Non-Commercial Research License**. 
1. This model inherits the restrictive license of InsightFace. 
2. **Ethical Guidelines:** This model is intended for artistic expression and identity-consistent portrait generation. Users are prohibited from using this tool to generate non-consensual deepfakes or misleading media.


## Citation

If you use this fine-tuned model in your research or projects, please cite it as:
```bibtex
@misc{rishabhincode2026trueface,
  author = {RishabhInCode},
  title = {TrueFace-Adapter: High-Fidelity Identity Preservation},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/RishabhInCode/TrueFace-Adapter}}
}
```