File size: 6,465 Bytes
30756f6
51fbf67
b686e53
 
31bdf8b
 
 
 
 
51fbf67
 
 
31bdf8b
51fbf67
31bdf8b
 
51fbf67
 
 
31bdf8b
51fbf67
31bdf8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fadb59e
 
 
 
 
 
 
 
 
 
81bf29c
fadb59e
 
 
81bf29c
fadb59e
 
31bdf8b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
51fbf67
 
 
b686e53
 
 
 
 
 
 
 
 
 
 
 
 
446b1fb
b686e53
 
446b1fb
b686e53
51fbf67
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
---
license: creativeml-openrail-m
base_model:
- SG161222/Realistic_Vision_V4.0_noVAE
tags:
- text-to-image
- stable-diffusion
- ip-adapter
- face-id
- identity-preservation
- portrait
- rishabh-in-code
library_name: diffusers
pipeline_tag: text-to-image
---

# TrueFace-Adapter: High-Fidelity Identity Preservation
![Model License](https://img.shields.io/badge/License-Non--Commercial-red.svg)
![Base Model](https://img.shields.io/badge/Base%20Model-Realistic%20Vision%20V4.0-blue.svg)

---
## Introduction
This is a custom, fine-tuned version of the **IP-Adapter-FaceID-PlusV2** model for Stable Diffusion 1.5. It was specifically trained to prioritize high-fidelity identity preservation while maintaining compositional realism across highly diverse prompts. 

The model relies on FaceID embeddings extracted via the InsightFace `buffalo_l` model to condition the image generation process directly into the UNet cross-attention layers.

* **Base Diffusion Model:** `SG161222/Realistic_Vision_V4.0_noVAE`
* **VAE:** `stabilityai/sd-vae-ft-mse`
* **Image Encoder:** `laion/CLIP-ViT-H-14-laion2B-s32B-b79K`
* **Dataset:** images sampled from `bitmind/celeb-a-hq`.
* **Optimization:** Joint optimization utilizing standard Diffusion Loss paired with Identity Loss (ArcFace Cosine Similarity).

## Evaluation Metrics
The model was rigorously evaluated against the generic zero-shot IP-Adapter baseline. Testing involved generating multiple stylistic variations (cinematic lighting, charcoal sketch, outdoor lighting, etc.) across various seed images.

| Metric | Baseline (Zero-Shot) | Fine-Tuned (This Model) | Note |
|---|---|---|---|
| **Identity Score** (Higher is better) | 0.8327 | **0.8754** | Significant improvement in facial structure retention. |
| **FID Score** (Lower is better) | **259.27** | 283.11 | Standard distributional gap trade-off when forcing strict identity constraints. |

*Note: In 1-to-1 sample comparisons, this fine-tuned model successfully pushed specific Identity Scores as high as **0.9680**, achieving superior sample-specific realism (FID: 421.97 vs Baseline: 448.15).*


## Generalization to Unseen Data (CelebA-HQ)

To prove TrueFace-Adapter does not overfit to the training data, we tested it on unseen subjects from the CelebA-HQ dataset across 5 distinct prompts (Cinematic, Smiling, Sunglasses, Studio, Charcoal Sketch). 

**Reference Subject (Unseen Data):**
![Original](celeb_original.png)

**Baseline (Standard IP-Adapter Zero-Shot):**
*Notice the loss of the square jawline, the alteration of the eye shape, and the complete loss of identity in the sketch (far right).*
![Baseline](celeb_baseline.png)

**TrueFace-Adapter (Ours):**
*The fine-tuned model strictly preserves the subject's deep-set eyes, specific jaw structure, and maintains high-fidelity likeness even in the charcoal sketch medium.*
![Finetuned](celeb_finetuned.png)


## Usage

To use this model, you first need to extract the face embedding and aligned face image using `insightface`. 

```python
import cv2
import torch
from insightface.app import FaceAnalysis
from insightface.utils import face_align
from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus

# 1. Setup Face Extraction
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider'])
app.prepare(ctx_id=0, det_size=(640, 640))

image = cv2.imread("your_seed_image.jpg")
faces = app.get(image)
faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224)

# 2. Setup Pipeline
device = "cuda"
base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
vae_model_path = "stabilityai/sd-vae-ft-mse"
image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
ip_ckpt = "ip-adapter-faceid-plusv2_sd15-finetuned_RishabhInCode.bin" # This repo's file

noise_scheduler = DDIMScheduler(
    num_train_timesteps=1000,
    beta_start=0.00085,
    beta_end=0.012,
    beta_schedule="scaled_linear",
    clip_sample=False,
    set_alpha_to_one=False,
    steps_offset=1,
)
vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
pipe = StableDiffusionPipeline.from_pretrained(
    base_model_path,
    torch_dtype=torch.float16,
    scheduler=noise_scheduler,
    vae=vae,
    safety_checker=None
).to(device)

# 3. Load IP-Adapter with Custom Fine-Tuned Weights
ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device)

# 4. Generate
prompt = "a cinematic portrait of the person in cyberpunk lighting"
images = ip_model.generate(
     prompt=prompt, 
     face_image=face_image, 
     faceid_embeds=faceid_embeds, 
     shortcut=True, 
     s_scale=1.0,
     num_samples=1, 
     width=512, 
     height=768, 
     num_inference_steps=30
)
images[0].save("output.png")
```

## Technical Lineage & Credits

This project is a specialized refinement of several foundational works in the Generative AI ecosystem.

### Base Architecture
* **Diffusion Model:** [Realistic Vision V4.0](https://huggingface.co/SG161222/Realistic_Vision_V4.0_noVAE) by SG161222.
* **Adapter Framework:** [IP-Adapter-FaceID-PlusV2](https://huggingface.co/h94/IP-Adapter-FaceID-PlusV2) by Tencent AI Lab.

### Component Acknowledgments
* **Face Embedding:** Developed using [InsightFace](https://github.com/deepinsight/insightface) (buffalo_l), utilizing the ArcFace identity loss function.
* **Image Encoding:** [CLIP-ViT-H-14-laion2B](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) for structural consistency.
* **Fine-Tuning Data:** Curated samples from the [CelebA-HQ Dataset](https://github.com/tkarras/progressive_growing_of_gans).

## License & Ethical Use
**TrueFace-Adapter** is released under a **Non-Commercial Research License**. 
1. This model inherits the restrictive license of InsightFace. 
2. **Ethical Guidelines:** This model is intended for artistic expression and identity-consistent portrait generation. Users are prohibited from using this tool to generate non-consensual deepfakes or misleading media.


## Citation

If you use this fine-tuned model in your research or projects, please cite it as:
```bibtex
@misc{rishabhincode2026trueface,
  author = {RishabhInCode},
  title = {TrueFace-Adapter: High-Fidelity Identity Preservation},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/RishabhInCode/TrueFace-Adapter}}
}
```