RishabhInCode commited on
Commit
31bdf8b
·
verified ·
1 Parent(s): 10d4766

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +103 -0
README.md CHANGED
@@ -1,3 +1,106 @@
1
  ---
2
  license: fair-noncommercial-research-license
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: fair-noncommercial-research-license
3
+ datasets:
4
+ - bitmind/celeb-a-hq
5
+ base_model:
6
+ - SG161222/Realistic_Vision_V4.0_noVAE
7
  ---
8
+ ---
9
+ tags:
10
+ - text-to-image
11
+ - stable-diffusion
12
+ - diffusers
13
+ - ip-adapter
14
+ - face-id
15
+ - custom-finetune
16
+ language:
17
+ - en
18
+ library_name: diffusers
19
+ ---
20
+
21
+ # IP-Adapter-FaceID-PlusV2-Finetuned (RishabhInCode)
22
+
23
+ ## Introduction
24
+ This is a custom, fine-tuned version of the **IP-Adapter-FaceID-PlusV2** model for Stable Diffusion 1.5. It was specifically trained to prioritize high-fidelity identity preservation while maintaining compositional realism across highly diverse prompts.
25
+
26
+ The model relies on FaceID embeddings extracted via the InsightFace `buffalo_l` model to condition the image generation process directly into the UNet cross-attention layers.
27
+
28
+ * **Base Diffusion Model:** `SG161222/Realistic_Vision_V4.0_noVAE`
29
+ * **VAE:** `stabilityai/sd-vae-ft-mse`
30
+ * **Image Encoder:** `laion/CLIP-ViT-H-14-laion2B-s32B-b79K`
31
+ * **Dataset:** images sampled from `bitmind/celeb-a-hq`.
32
+ * **Optimization:** Joint optimization utilizing standard Diffusion Loss paired with Identity Loss (ArcFace Cosine Similarity).
33
+
34
+ ## Evaluation Metrics
35
+ The model was rigorously evaluated against the generic zero-shot IP-Adapter baseline. Testing involved generating multiple stylistic variations (cinematic lighting, charcoal sketch, outdoor lighting, etc.) across various seed images.
36
+
37
+ | Metric | Baseline (Zero-Shot) | Fine-Tuned (This Model) | Note |
38
+ |---|---|---|---|
39
+ | **Identity Score** (Higher is better) | 0.8327 | **0.8754** | Significant improvement in facial structure retention. |
40
+ | **FID Score** (Lower is better) | **259.27** | 283.11 | Standard distributional gap trade-off when forcing strict identity constraints. |
41
+
42
+ *Note: In 1-to-1 sample comparisons, this fine-tuned model successfully pushed specific Identity Scores as high as **0.9680**, achieving superior sample-specific realism (FID: 421.97 vs Baseline: 448.15).*
43
+
44
+ ## Usage
45
+
46
+ To use this model, you first need to extract the face embedding and aligned face image using `insightface`.
47
+
48
+ ```python
49
+ import cv2
50
+ import torch
51
+ from insightface.app import FaceAnalysis
52
+ from insightface.utils import face_align
53
+ from diffusers import StableDiffusionPipeline, DDIMScheduler, AutoencoderKL
54
+ from ip_adapter.ip_adapter_faceid import IPAdapterFaceIDPlus
55
+
56
+ # 1. Setup Face Extraction
57
+ app = FaceAnalysis(name="buffalo_l", providers=['CUDAExecutionProvider'])
58
+ app.prepare(ctx_id=0, det_size=(640, 640))
59
+
60
+ image = cv2.imread("your_seed_image.jpg")
61
+ faces = app.get(image)
62
+ faceid_embeds = torch.from_numpy(faces[0].normed_embedding).unsqueeze(0)
63
+ face_image = face_align.norm_crop(image, landmark=faces[0].kps, image_size=224)
64
+
65
+ # 2. Setup Pipeline
66
+ device = "cuda"
67
+ base_model_path = "SG161222/Realistic_Vision_V4.0_noVAE"
68
+ vae_model_path = "stabilityai/sd-vae-ft-mse"
69
+ image_encoder_path = "laion/CLIP-ViT-H-14-laion2B-s32B-b79K"
70
+ ip_ckpt = "ip-adapter-faceid-plusv2_sd15-finetuned_RishabhInCode.bin" # This repo's file
71
+
72
+ noise_scheduler = DDIMScheduler(
73
+ num_train_timesteps=1000,
74
+ beta_start=0.00085,
75
+ beta_end=0.012,
76
+ beta_schedule="scaled_linear",
77
+ clip_sample=False,
78
+ set_alpha_to_one=False,
79
+ steps_offset=1,
80
+ )
81
+ vae = AutoencoderKL.from_pretrained(vae_model_path).to(dtype=torch.float16)
82
+ pipe = StableDiffusionPipeline.from_pretrained(
83
+ base_model_path,
84
+ torch_dtype=torch.float16,
85
+ scheduler=noise_scheduler,
86
+ vae=vae,
87
+ safety_checker=None
88
+ ).to(device)
89
+
90
+ # 3. Load IP-Adapter with Custom Fine-Tuned Weights
91
+ ip_model = IPAdapterFaceIDPlus(pipe, image_encoder_path, ip_ckpt, device)
92
+
93
+ # 4. Generate
94
+ prompt = "a cinematic portrait of the person in cyberpunk lighting"
95
+ images = ip_model.generate(
96
+ prompt=prompt,
97
+ face_image=face_image,
98
+ faceid_embeds=faceid_embeds,
99
+ shortcut=True,
100
+ s_scale=1.0,
101
+ num_samples=1,
102
+ width=512,
103
+ height=768,
104
+ num_inference_steps=30
105
+ )
106
+ images[0].save("output.png")