| | --- |
| | license: creativeml-openrail-m |
| | base_model: |
| | - SG161222/Realistic_Vision_V4.0_noVAE |
| | tags: |
| | - text-to-image |
| | - stable-diffusion |
| | - ip-adapter |
| | - face-id |
| | - identity-preservation |
| | - portrait |
| | - rishabh-in-code |
| | library_name: diffusers |
| | pipeline_tag: text-to-image |
| | --- |
| | |
| | # TrueFace-Adapter: High-Fidelity Identity Preservation |
| |  |
| |  |
| |
|
| | --- |
| | ## Introduction |
| | This is a custom, fine-tuned version of the **IP-Adapter-FaceID-PlusV2** model for Stable Diffusion 1.5. It was specifically trained to prioritize high-fidelity identity preservation while maintaining compositional realism across highly diverse prompts. |
| |
|
| | The model relies on FaceID embeddings extracted via the InsightFace `buffalo_l` model to condition the image generation process directly into the UNet cross-attention layers. |
| |
|
| | * **Base Diffusion Model:** `SG161222/Realistic_Vision_V4.0_noVAE` |
| | * **VAE:** `stabilityai/sd-vae-ft-mse` |
| | * **Image Encoder:** `laion/CLIP-ViT-H-14-laion2B-s32B-b79K` |
| | * **Dataset:** images sampled from `bitmind/celeb-a-hq`. |
| | * **Optimization:** Joint optimization utilizing standard Diffusion Loss paired with Identity Loss (ArcFace Cosine Similarity). |
| |
|
| | ## Evaluation Metrics |
| | The model was rigorously evaluated against the generic zero-shot IP-Adapter baseline. Testing involved generating multiple stylistic variations (cinematic lighting, charcoal sketch, outdoor lighting, etc.) across various seed images. |
| |
|
| | | Metric | Baseline (Zero-Shot) | Fine-Tuned (This Model) | Note | |
| | |---|---|---|---| |
| | | **Identity Score** (Higher is better) | 0.8327 | **0.8754** | Significant improvement in facial structure retention. | |
| | | **FID Score** (Lower is better) | **259.27** | 283.11 | Standard distributional gap trade-off when forcing strict identity constraints. | |
| |
|
| | *Note: In 1-to-1 sample comparisons, this fine-tuned model successfully pushed specific Identity Scores as high as **0.9680**, achieving superior sample-specific realism (FID: 421.97 vs Baseline: 448.15).* |
| |
|
| |
|
| | ## Generalization to Unseen Data (CelebA-HQ) |
| |
|
| | To prove TrueFace-Adapter does not overfit to the training data, we tested it on unseen subjects from the CelebA-HQ dataset across 5 distinct prompts (Cinematic, Smiling, Sunglasses, Studio, Charcoal Sketch). |
| |
|
| | **Reference Subject (Unseen Data):** |
| |  |
| |
|
| | **Baseline (Standard IP-Adapter Zero-Shot):** |
| | *Notice the loss of the square jawline, the alteration of the eye shape, and the complete loss of identity in the sketch (far right).* |
| |  |
| |
|
| | **TrueFace-Adapter (Ours):** |
| | *The fine-tuned model strictly preserves the subject's deep-set eyes, specific jaw structure, and maintains high-fidelity likeness even in the charcoal sketch medium.* |
| |  |
| |
|
| |
|
| | ## Usage |
| |
|
| | To use this model, you first need to extract the face embedding and aligned face image using `insightface`. |
| |
|
| | ```python |
| | import cv2 |
| | import torch |
| | from insightface.app import FaceAnalysis |
| | from insightface.utils import face_align |
| | from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL |
| | from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus |
| | |
| | # 1. Setup Face Extraction |
| | app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider']) |
| | app.prepare(ctx_id=0, det_size=(640, 640)) |
| | |
| | image = cv2.imread("your_seed_image.jpg") |
| | faces = app.get(image) |
| | faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0) |
| | face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224) |
| | |
| | # 2. Setup Pipeline |
| | device = "cuda" |
| | base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE" |
| | vae_model_path = "stabilityai/sd-vae-ft-mse" |
| | image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K" |
| | ip_ckpt = "ip-adapter-faceid-plusv2_sd15-finetuned_RishabhInCode.bin" # This repo's file |
| | |
| | noise_scheduler = DDIMScheduler( |
| | num_train_timesteps=1000, |
| | beta_start=0.00085, |
| | beta_end=0.012, |
| | beta_schedule="scaled_linear", |
| | clip_sample=False, |
| | set_alpha_to_one=False, |
| | steps_offset=1, |
| | ) |
| | vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16) |
| | pipe = StableDiffusionPipeline.from_pretrained( |
| | base_model_path, |
| | torch_dtype=torch.float16, |
| | scheduler=noise_scheduler, |
| | vae=vae, |
| | safety_checker=None |
| | ).to(device) |
| | |
| | # 3. Load IP-Adapter with Custom Fine-Tuned Weights |
| | ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device) |
| | |
| | # 4. Generate |
| | prompt = "a cinematic portrait of the person in cyberpunk lighting" |
| | images = ip_model.generate( |
| | prompt=prompt, |
| | face_image=face_image, |
| | faceid_embeds=faceid_embeds, |
| | shortcut=True, |
| | s_scale=1.0, |
| | num_samples=1, |
| | width=512, |
| | height=768, |
| | num_inference_steps=30 |
| | ) |
| | images[0].save("output.png") |
| | ``` |
| |
|
| | ## Technical Lineage & Credits |
| |
|
| | This project is a specialized refinement of several foundational works in the Generative AI ecosystem. |
| |
|
| | ### Base Architecture |
| | * **Diffusion Model:** [Realistic Vision V4.0](https://huggingface.co/SG161222/Realistic_Vision_V4.0_noVAE) by SG161222. |
| | * **Adapter Framework:** [IP-Adapter-FaceID-PlusV2](https://huggingface.co/h94/IP-Adapter-FaceID-PlusV2) by Tencent AI Lab. |
| |
|
| | ### Component Acknowledgments |
| | * **Face Embedding:** Developed using [InsightFace](https://github.com/deepinsight/insightface) (buffalo_l), utilizing the ArcFace identity loss function. |
| | * **Image Encoding:** [CLIP-ViT-H-14-laion2B](https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K) for structural consistency. |
| | * **Fine-Tuning Data:** Curated samples from the [CelebA-HQ Dataset](https://github.com/tkarras/progressive_growing_of_gans). |
| | |
| | ## License & Ethical Use |
| | **TrueFace-Adapter** is released under a **Non-Commercial Research License**. |
| | 1. This model inherits the restrictive license of InsightFace. |
| | 2. **Ethical Guidelines:** This model is intended for artistic expression and identity-consistent portrait generation. Users are prohibited from using this tool to generate non-consensual deepfakes or misleading media. |
| | |
| | |
| | ## Citation |
| | |
| | If you use this fine-tuned model in your research or projects, please cite it as: |
| | ```bibtex |
| | @misc{rishabhincode2026trueface, |
| | author = {RishabhInCode}, |
| | title = {TrueFace-Adapter: High-Fidelity Identity Preservation}, |
| | year = {2026}, |
| | publisher = {Hugging Face}, |
| | howpublished = {\url{https://huggingface.co/RishabhInCode/TrueFace-Adapter}} |
| | } |
| | ``` |