Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,106 @@
|
|
| 1 |
---
|
| 2 |
license: fair-noncommercial-research-license
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
license: fair-noncommercial-research-license
|
| 3 |
+
datasets:
|
| 4 |
+
- bitmind/celeb-a-hq
|
| 5 |
+
base_model:
|
| 6 |
+
- SG161222/Realistic_Vision_V4.0_noVAE
|
| 7 |
---
|
| 8 |
+
---
|
| 9 |
+
tags:
|
| 10 |
+
- text-to-image
|
| 11 |
+
- stable-diffusion
|
| 12 |
+
- diffusers
|
| 13 |
+
- ip-adapter
|
| 14 |
+
- face-id
|
| 15 |
+
- custom-finetune
|
| 16 |
+
language:
|
| 17 |
+
- en
|
| 18 |
+
library_name: diffusers
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
# IP-Adapter-FaceID-PlusV2-Finetuned (RishabhInCode)
|
| 22 |
+
|
| 23 |
+
## Introduction
|
| 24 |
+
This is a custom, fine-tuned version of the **IP-Adapter-FaceID-PlusV2** model for Stable Diffusion 1.5. It was specifically trained to prioritize high-fidelity identity preservation while maintaining compositional realism across highly diverse prompts.
|
| 25 |
+
|
| 26 |
+
The model relies on FaceID embeddings extracted via the InsightFace `buffalo_l` model to condition the image generation process directly into the UNet cross-attention layers.
|
| 27 |
+
|
| 28 |
+
* **Base Diffusion Model:** `SG161222/Realistic_Vision_V4.0_noVAE`
|
| 29 |
+
* **VAE:** `stabilityai/sd-vae-ft-mse`
|
| 30 |
+
* **Image Encoder:** `laion/CLIP-ViT-H-14-laion2B-s32B-b79K`
|
| 31 |
+
* **Dataset:** images sampled from `bitmind/celeb-a-hq`.
|
| 32 |
+
* **Optimization:** Joint optimization utilizing standard Diffusion Loss paired with Identity Loss (ArcFace Cosine Similarity).
|
| 33 |
+
|
| 34 |
+
## Evaluation Metrics
|
| 35 |
+
The model was rigorously evaluated against the generic zero-shot IP-Adapter baseline. Testing involved generating multiple stylistic variations (cinematic lighting, charcoal sketch, outdoor lighting, etc.) across various seed images.
|
| 36 |
+
|
| 37 |
+
| Metric | Baseline (Zero-Shot) | Fine-Tuned (This Model) | Note |
|
| 38 |
+
|---|---|---|---|
|
| 39 |
+
| **Identity Score** (Higher is better) | 0.8327 | **0.8754** | Significant improvement in facial structure retention. |
|
| 40 |
+
| **FID Score** (Lower is better) | **259.27** | 283.11 | Standard distributional gap trade-off when forcing strict identity constraints. |
|
| 41 |
+
|
| 42 |
+
*Note: In 1-to-1 sample comparisons, this fine-tuned model successfully pushed specific Identity Scores as high as **0.9680**, achieving superior sample-specific realism (FID: 421.97 vs Baseline: 448.15).*
|
| 43 |
+
|
| 44 |
+
## Usage
|
| 45 |
+
|
| 46 |
+
To use this model, you first need to extract the face embedding and aligned face image using `insightface`.
|
| 47 |
+
|
| 48 |
+
```python
|
| 49 |
+
import cv2
|
| 50 |
+
import torch
|
| 51 |
+
from insightface.app import FaceAnalysis
|
| 52 |
+
from insightface.utils import face_align
|
| 53 |
+
from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
|
| 54 |
+
from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus
|
| 55 |
+
|
| 56 |
+
# 1. Setup Face Extraction
|
| 57 |
+
app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider'])
|
| 58 |
+
app.prepare(ctx_id=0, det_size=(640, 640))
|
| 59 |
+
|
| 60 |
+
image = cv2.imread("your_seed_image.jpg")
|
| 61 |
+
faces = app.get(image)
|
| 62 |
+
faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
|
| 63 |
+
face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224)
|
| 64 |
+
|
| 65 |
+
# 2. Setup Pipeline
|
| 66 |
+
device = "cuda"
|
| 67 |
+
base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
|
| 68 |
+
vae_model_path = "stabilityai/sd-vae-ft-mse"
|
| 69 |
+
image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
|
| 70 |
+
ip_ckpt = "ip-adapter-faceid-plusv2_sd15-finetuned_RishabhInCode.bin" # This repo's file
|
| 71 |
+
|
| 72 |
+
noise_scheduler = DDIMScheduler(
|
| 73 |
+
num_train_timesteps=1000,
|
| 74 |
+
beta_start=0.00085,
|
| 75 |
+
beta_end=0.012,
|
| 76 |
+
beta_schedule="scaled_linear",
|
| 77 |
+
clip_sample=False,
|
| 78 |
+
set_alpha_to_one=False,
|
| 79 |
+
steps_offset=1,
|
| 80 |
+
)
|
| 81 |
+
vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
|
| 82 |
+
pipe = StableDiffusionPipeline.from_pretrained(
|
| 83 |
+
base_model_path,
|
| 84 |
+
torch_dtype=torch.float16,
|
| 85 |
+
scheduler=noise_scheduler,
|
| 86 |
+
vae=vae,
|
| 87 |
+
safety_checker=None
|
| 88 |
+
).to(device)
|
| 89 |
+
|
| 90 |
+
# 3. Load IP-Adapter with Custom Fine-Tuned Weights
|
| 91 |
+
ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device)
|
| 92 |
+
|
| 93 |
+
# 4. Generate
|
| 94 |
+
prompt = "a cinematic portrait of the person in cyberpunk lighting"
|
| 95 |
+
images = ip_model.generate(
|
| 96 |
+
prompt=prompt,
|
| 97 |
+
face_image=face_image,
|
| 98 |
+
faceid_embeds=faceid_embeds,
|
| 99 |
+
shortcut=True,
|
| 100 |
+
s_scale=1.0,
|
| 101 |
+
num_samples=1,
|
| 102 |
+
width=512,
|
| 103 |
+
height=768,
|
| 104 |
+
num_inference_steps=30
|
| 105 |
+
)
|
| 106 |
+
images[0].save("output.png")
|