Rogquim Alissonerdx commited on
Commit
8e0a240
·
0 Parent(s):

Duplicate from Alissonerdx/BFS-Best-Face-Swap-Video

Browse files

Co-authored-by: Alisson Pereira Anjos <Alissonerdx@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,77 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <<<<<<< HEAD
2
+ *.7z filter=lfs diff=lfs merge=lfs -text
3
+ *.arrow filter=lfs diff=lfs merge=lfs -text
4
+ *.bin filter=lfs diff=lfs merge=lfs -text
5
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
6
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
7
+ *.ftz filter=lfs diff=lfs merge=lfs -text
8
+ *.gz filter=lfs diff=lfs merge=lfs -text
9
+ *.h5 filter=lfs diff=lfs merge=lfs -text
10
+ *.joblib filter=lfs diff=lfs merge=lfs -text
11
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
12
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
13
+ *.model filter=lfs diff=lfs merge=lfs -text
14
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
15
+ *.npy filter=lfs diff=lfs merge=lfs -text
16
+ *.npz filter=lfs diff=lfs merge=lfs -text
17
+ *.onnx filter=lfs diff=lfs merge=lfs -text
18
+ *.ot filter=lfs diff=lfs merge=lfs -text
19
+ *.parquet filter=lfs diff=lfs merge=lfs -text
20
+ *.pb filter=lfs diff=lfs merge=lfs -text
21
+ *.pickle filter=lfs diff=lfs merge=lfs -text
22
+ *.pkl filter=lfs diff=lfs merge=lfs -text
23
+ *.pt filter=lfs diff=lfs merge=lfs -text
24
+ *.pth filter=lfs diff=lfs merge=lfs -text
25
+ *.rar filter=lfs diff=lfs merge=lfs -text
26
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
27
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
29
+ *.tar filter=lfs diff=lfs merge=lfs -text
30
+ *.tflite filter=lfs diff=lfs merge=lfs -text
31
+ *.tgz filter=lfs diff=lfs merge=lfs -text
32
+ *.wasm filter=lfs diff=lfs merge=lfs -text
33
+ *.xz filter=lfs diff=lfs merge=lfs -text
34
+ *.zip filter=lfs diff=lfs merge=lfs -text
35
+ *.zst filter=lfs diff=lfs merge=lfs -text
36
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
37
+ ltx-2/ filter=lfs diff=lfs merge=lfs -text
38
+ =======
39
+ *.7z filter=lfs diff=lfs merge=lfs -text
40
+ *.arrow filter=lfs diff=lfs merge=lfs -text
41
+ *.bin filter=lfs diff=lfs merge=lfs -text
42
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
43
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
44
+ *.ftz filter=lfs diff=lfs merge=lfs -text
45
+ *.gz filter=lfs diff=lfs merge=lfs -text
46
+ *.h5 filter=lfs diff=lfs merge=lfs -text
47
+ *.joblib filter=lfs diff=lfs merge=lfs -text
48
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
49
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
50
+ *.model filter=lfs diff=lfs merge=lfs -text
51
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
52
+ *.npy filter=lfs diff=lfs merge=lfs -text
53
+ *.npz filter=lfs diff=lfs merge=lfs -text
54
+ *.onnx filter=lfs diff=lfs merge=lfs -text
55
+ *.ot filter=lfs diff=lfs merge=lfs -text
56
+ *.parquet filter=lfs diff=lfs merge=lfs -text
57
+ *.pb filter=lfs diff=lfs merge=lfs -text
58
+ *.pickle filter=lfs diff=lfs merge=lfs -text
59
+ *.pkl filter=lfs diff=lfs merge=lfs -text
60
+ *.pt filter=lfs diff=lfs merge=lfs -text
61
+ *.pth filter=lfs diff=lfs merge=lfs -text
62
+ *.rar filter=lfs diff=lfs merge=lfs -text
63
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
64
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
65
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
66
+ *.tar filter=lfs diff=lfs merge=lfs -text
67
+ *.tflite filter=lfs diff=lfs merge=lfs -text
68
+ *.tgz filter=lfs diff=lfs merge=lfs -text
69
+ *.wasm filter=lfs diff=lfs merge=lfs -text
70
+ *.xz filter=lfs diff=lfs merge=lfs -text
71
+ *.zip filter=lfs diff=lfs merge=lfs -text
72
+ *.zst filter=lfs diff=lfs merge=lfs -text
73
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
74
+ ltx-2/ filter=lfs diff=lfs merge=lfs -text
75
+ >>>>>>> 47487e3e7ace468206bad3d0247ed81b792bf222
76
+ examples/ filter=lfs diff=lfs merge=lfs -text
77
+ workflows/workflow_head_swap_drag_and_drop.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,500 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: ltx-2-community-license-agreement
4
+ tags:
5
+ - ltx-2
6
+ - ic-lora
7
+ - head-swap
8
+ - video-to-video
9
+ - image-to-video
10
+ - bfs
11
+ - lora
12
+ base_model:
13
+ - Lightricks/LTX-2.3
14
+ library_name: diffusers
15
+ pipeline_tag: image-to-video
16
+ ---
17
+
18
+ ## ⚠️ Ethical Use & Disclaimer
19
+
20
+ This model is a technical tool designed for **Digital Identity Research, Professional VFX Workflows, and Cinematic Prototyping**.
21
+
22
+ By downloading or using this LoRA, you acknowledge and agree to the following:
23
+
24
+ * **Intended Use:** Designed for filmmakers, VFX artists, and researchers exploring high-fidelity video identity transformation.
25
+ * **Consent & Rights:** You must possess explicit legal consent and all necessary rights from any individual whose likeness is being processed.
26
+ * **Legal Compliance:** You are fully responsible for complying with all local and international laws regarding synthetic media.
27
+ * **Liability Waiver:** This model is provided *"as is."* **As the creator (Alissonerdx), I assume no responsibility for misuse.** Any legal, ethical, or social consequences are solely the responsibility of the end user.
28
+
29
+ ---
30
+
31
+ # 📺 Video Examples
32
+
33
+ ## V1 Examples
34
+
35
+ Generated using the **Frame 0 Anchoring Technique**.
36
+ All examples follow the guide video motion while preserving the identity provided in the first frame.
37
+
38
+ | Example 1 | Example 2 |
39
+ | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
40
+ | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/1.mp4" controls autoplay loop muted></video> | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/2.mp4" controls autoplay loop muted></video> |
41
+
42
+ | Example 3 | Example 4 |
43
+ | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
44
+ | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/3.mp4" controls autoplay loop muted></video> | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/4.mp4" controls autoplay loop muted></video> |
45
+
46
+ | Example 5 |
47
+ | ------------------------------------------------------------------------------------------------------------------------------------------ |
48
+ | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/5.mp4" controls autoplay loop muted></video> |
49
+
50
+ ## V3 Examples
51
+
52
+ If you want to see the full setup in practice, watch here:
53
+
54
+ [https://www.youtube.com/watch?v=HBp03iu7wLA](https://www.youtube.com/watch?v=HBp03iu7wLA)
55
+
56
+ The following examples demonstrate the new **persistent-template workflow** used in V3:
57
+
58
+ | Example 6 | Example 7 |
59
+ | ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
60
+ | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/6.mp4" controls autoplay loop muted></video> | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/7.mp4" controls autoplay loop muted></video> |
61
+
62
+ | Example 8 |
63
+ | ------------------------------------------------------------------------------------------------------------------------------------------ |
64
+ | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/8.mp4" controls autoplay loop muted></video> |
65
+
66
+ The image references for the versions are stored under:
67
+
68
+ ```txt
69
+ ltx-2.3/...
70
+ ```
71
+
72
+ ---
73
+
74
+ # 🛠 Technical Background (V1)
75
+
76
+ To achieve this level of identity transfer, I **heavily modified the official LTX-2 training scripts**.
77
+
78
+ ### Key Improvements
79
+
80
+ * **Novel Conditioning Injection:** Custom latent injection methods for reference identity stabilization.
81
+ * **Noise Distribution Overhaul:** Implemented a **custom High-Noise Power Law timestep distribution**, forcing the model to prioritize target identity reconstruction over guide-video context.
82
+ * **Training Compute:** 60+ hours of training on **NVIDIA RTX PRO 6000 Blackwell GPUs**, iterating through 300GB+ of experimental checkpoints.
83
+
84
+ ---
85
+
86
+ # 📊 Dataset Specifications
87
+
88
+ ## V1 Dataset
89
+
90
+ * **300 high-quality head swap video pairs**
91
+ * Trained on **512x512 buckets**
92
+ * Primarily **landscape format**
93
+ * Optimized for **close-up framing**
94
+
95
+ Wide shots may reduce identity fidelity.
96
+
97
+ ---
98
+
99
+ # 💡 Inference Guide (V1)
100
+
101
+ ## 🔴 CRITICAL — Frame 0 Requirement
102
+
103
+ This version was trained to use **Frame 0 as the identity anchor**.
104
+
105
+ You must prepare the first frame correctly.
106
+
107
+ ### Recommended Workflow
108
+
109
+ 1. Perform a high-quality head swap on Frame 0.
110
+ 2. Use that processed frame as conditioning input.
111
+ 3. Run the full video generation.
112
+
113
+ For best results, prepare Frame 0 using my previous **BFS Image Models**.
114
+
115
+ ---
116
+
117
+ ## Optimization
118
+
119
+ ### LoRA Strength
120
+
121
+ * **1.0** → Best motion fidelity
122
+ * **>1.0** → Stronger identity and hair capture, but may distort original motion
123
+
124
+ ### Multi-Pass Workflows
125
+
126
+ You can experiment with multiple passes using different strengths.
127
+
128
+ ### Prompting
129
+
130
+ Detailed prompts currently have **no effect**.
131
+
132
+ Trigger remains:
133
+
134
+ ```txt
135
+ head swap
136
+ ```
137
+
138
+ ---
139
+
140
+ # ⚠️ Known Issues (V1 – Alpha)
141
+
142
+ * **Identity Leakage:** Hair from the guide video may reappear.
143
+ * **Hard Cuts:** Jump cuts can reset identity.
144
+ * **Portrait Format:** Performance is significantly better in landscape.
145
+
146
+ ---
147
+
148
+ # 🚀 Version 2 – Major Update
149
+
150
+ V2 introduces a **complete redesign of conditioning strategy and masking logic**, significantly improving identity robustness and reducing leakage.
151
+
152
+ ---
153
+
154
+ ## 🔹 Multiple Conditioning Modes (Using First Frame)
155
+
156
+ V2 supports multiple identity injection approaches:
157
+
158
+ ### 1️⃣ Direct Photo Conditioning
159
+
160
+ Use a clean photo of the new face as reference input.
161
+
162
+ This method works and can produce strong results. However, because the model must internally reconcile lighting, perspective, depth, and occlusion differences, it may need to fight to correctly integrate the new identity into the guide video. In some cases, this can reduce stability or identity consistency.
163
+
164
+ ### 2️⃣ First-Frame Head Swap (Recommended)
165
+
166
+ Applying a proper head swap on Frame 0 still produces **extremely strong and reliable results**.
167
+
168
+ Because the first frame is already structurally correct — pose, lighting, depth, and occlusions — the model has significantly less work to do. Instead of forcing alignment from a static photo, it simply propagates and stabilizes the identity through time.
169
+
170
+ This approach generally:
171
+
172
+ * Produces higher identity fidelity
173
+ * Reduces deformation
174
+ * Minimizes integration artifacts
175
+ * Improves overall temporal stability
176
+
177
+ ### 3️⃣ Automatic Magazine-Style Overlay
178
+
179
+ The new face is automatically cut and positioned over the guide face using mask alignment.
180
+ This simulates a magazine-cutout-style overlay, but performed automatically based on mask positioning.
181
+
182
+ ### 4️⃣ Manual Overlay
183
+
184
+ Advanced users may manually composite the new face over Frame 0 before running inference.
185
+
186
+ ---
187
+
188
+ ## 🔹 Facial Motion Behavior (Important Change)
189
+
190
+ Unlike V1:
191
+
192
+ **V2 does not follow the original guide face’s facial micro-movements.**
193
+
194
+ The guide face is fully masked to prevent identity leakage.
195
+
196
+ This makes masking quality critical.
197
+
198
+ ### Mask Requirements
199
+
200
+ * The guide face must be completely covered.
201
+ * Mask color must be a **magenta tone**.
202
+ * Any visible guide identity may leak into the final output.
203
+
204
+ ---
205
+
206
+ ## 🔹 Mask Types
207
+
208
+ Users may alternate between:
209
+
210
+ ### ▪ Square Masks
211
+
212
+ * More stable identity
213
+ * Better consistency
214
+ * Often produce stronger overall results
215
+ * May generate slightly oversized heads due to spatial padding
216
+
217
+ In most scenarios, square masks tend to perform better because they provide additional spatial context for the model to reconstruct structure and hair.
218
+
219
+ ### ▪ Tight / Adjusted Masks
220
+
221
+ * More natural head proportions
222
+ * May deform if guide head shape differs significantly
223
+ * Sensitive to long-hair mismatches
224
+
225
+ If the original guide has long hair and the new identity does not, deformation risk increases.
226
+
227
+ ---
228
+
229
+ ## 🔹 Dataset & Training Improvements (V2)
230
+
231
+ * **800+ video pairs**
232
+ * Trained at **768 resolution**
233
+ * **768** is the recommended inference resolution
234
+ * Improved hair stability
235
+ * Reduced identity leakage compared to V1
236
+ * More robust identity transfer under motion
237
+
238
+ ---
239
+
240
+ ## 🔹 First Pass vs Second Pass
241
+
242
+ You may:
243
+
244
+ * Run a single pass at 768 (recommended)
245
+ * Or run a downscaled first pass plus a second upscale pass
246
+
247
+ ⚠️ Important:
248
+
249
+ A second pass may alter identity from the first pass and reduce consistency in some cases.
250
+
251
+ ---
252
+
253
+ ## 🔹 Trigger
254
+
255
+ Trigger remains:
256
+
257
+ ```txt
258
+ head swap
259
+ ```
260
+
261
+ ---
262
+
263
+ # 🚀 Version 3 – Persistent Template Workflow
264
+
265
+ V3 introduces a new **persistent-template conditioning workflow**.
266
+
267
+ Unlike previous versions, which relied primarily on the identity being established from **Frame 0 only**, V3 uses a **custom guide-video construction step** that keeps the new face visible throughout the entire guide sequence.
268
+
269
+ This results in a much stronger and more persistent identity signal during inference.
270
+
271
+ # 🙏 Acknowledgements
272
+
273
+ Special thanks to **facy.ai** for sponsoring the GPU used to train this model.
274
+
275
+ If you want to check their platform, you can use my referral link:
276
+
277
+ [https://facy.ai/a/headswap](https://facy.ai/a/headswap)
278
+
279
+ ---
280
+
281
+ ## 🔹 How V3 Works
282
+
283
+ V3 uses a custom node from **ComfyUI-BFSNodes** to prepare the guide video before inference.
284
+
285
+ Repository:
286
+
287
+ ```txt
288
+ https://github.com/alisson-anjos/ComfyUI-BFSNodes
289
+ ```
290
+
291
+ Workflow file:
292
+
293
+ ```txt
294
+ workflows/workflow_ltx2_head_swap_drag_and_drop_v3.0
295
+ ```
296
+
297
+ The guide-video preparation process works like this:
298
+
299
+ 1. Start from the original guide video
300
+ 2. Add a **vertical green chroma-key strip** on the side
301
+ 3. Place the **reference face image** inside that strip
302
+ 4. Apply this composition to **every frame** of the original video
303
+ 5. Use this new composite video as the actual inference guide
304
+
305
+ This means the new identity remains **fully visible during all frames** of the guide video, instead of appearing only in Frame 0 like in previous versions.
306
+
307
+ That is the main reason V3 can achieve better consistency than earlier versions.
308
+
309
+ ---
310
+
311
+ ## 🔹 Why V3 Is Different
312
+
313
+ Because the identity reference stays visible during the full guide sequence, V3 gives the model a much more stable conditioning signal across time.
314
+
315
+ In practice, this can improve:
316
+
317
+ * Identity consistency
318
+ * Temporal stability
319
+ * Resistance to identity drift
320
+ * Facial motion continuity
321
+ * Lip sync behavior
322
+ * Expressive facial movement preservation
323
+
324
+ This version is especially useful for shots where the face remains visible for longer periods, or where dialogue, mouth movement, and facial acting matter more.
325
+
326
+ V3 is not just a refinement of the first-frame method. It changes the conditioning logic by giving the model access to a persistent identity template across the entire inference sequence.
327
+
328
+ ---
329
+
330
+ ## 🔹 Final Output Behavior
331
+
332
+ Even though the guide video used during inference contains the **vertical chroma-key side strip**, the **final generated result does not include that strip**.
333
+
334
+ The generated video is returned in the **original resolution and framing** of the source guide video.
335
+
336
+ So in practice:
337
+
338
+ * The green side strip exists only in the internal guide/template video
339
+ * It is used only to improve inference conditioning
340
+ * It does not appear in the final output
341
+
342
+ ---
343
+
344
+ ## 🔹 Prompting for V3
345
+
346
+ For V3, users can also pass the composite guide video into a vision-capable model to extract a structured prompt.
347
+
348
+ This is useful because the composite video contains two different information sources:
349
+
350
+ * the **reference identity** inside the side strip
351
+ * the **performance and scene information** in the main video area
352
+
353
+ This helps keep identity and action description separated more cleanly.
354
+
355
+ ### Recommended Prompt Template
356
+
357
+ ```txt
358
+ Analyze this composite video.
359
+
360
+ The video contains:
361
+ 1. a side chroma-key panel with a reference face image
362
+ 2. a main performance video showing the body, clothing, movement, hand actions, objects, framing, and environment
363
+
364
+ Your task is to extract:
365
+ - the target face identity from the side panel
366
+ - the performance/action from the main video
367
+
368
+ Critical rules:
369
+ - The side-panel face is the only valid source for identity traits and head-level accessories.
370
+ - Ignore the visible face and head appearance in the main video completely.
371
+ - Do not describe any face, hair, hairstyle, hair color, eye color, makeup, facial features, facial expression, attractiveness, headwear, hood, hat, or accessories from the main video.
372
+ - In the ACTION section, describe the performer only as "a person" and focus only on body movement, clothing, hand actions, objects, framing, and environment.
373
+ - Do not mention the chroma panel, green background, split layout, or editing structure.
374
+ - Be factual and non-creative.
375
+ - Do not guess uncertain details. If a detail is not clearly visible, omit it.
376
+
377
+ Return exactly in this format:
378
+ head_swap:
379
+
380
+ FACE:
381
+ A brief but detailed objective identity description from the side-panel face only. Include, when clearly visible: apparent gender, apparent ethnicity, skin tone or complexion, approximate age range, head shape, hair or baldness pattern, hair color, eye color, facial hair, visible skin details, headwear or head covering, visible facial accessories, and any especially distinctive facial trait. Prioritize the eyes when they are a strong defining feature.
382
+
383
+ ACTION:
384
+ A concise performance description from the main video. Include only: visible clothing, body position, movement, hand actions, objects being shown or handled, camera-facing behavior, framing, and environment. Do not include any face or head appearance from the main video.
385
+
386
+ Good example:
387
+ FACE:
388
+ Female, fair skin, approximately 20-30 years old, oval head shape, long wavy vivid blue-violet hair, bright golden-amber eyes with dark defined pupils, no facial hair, smooth skin, and pink flower hair accessories as a distinctive head adornment.
389
+
390
+ ACTION:
391
+ A person in a dark top faces the camera indoors, holds a package of false eyelashes close to the lens, peels one lash from the backing, brings it near the eye area, and examines it while making small hand movements.
392
+
393
+ Bad example:
394
+ ACTION:
395
+ A person with long curly blonde braids holds a pair of false eyelashes...
396
+ ```
397
+
398
+ ### How to Use
399
+
400
+ 1. Generate the V3 composite guide video using the node
401
+ 2. Pass that composite video into a vision-capable model
402
+ 3. Extract the structured **FACE** and **ACTION** prompt
403
+ 4. Use that output as the base prompt for the V3 workflow
404
+
405
+ ---
406
+
407
+ ## 🔹 Captions / Descriptions for V3
408
+
409
+ If you want automatic captions or prompt extraction from video, you can also use my Ollama nodes.
410
+
411
+ Repository:
412
+
413
+ ```txt
414
+ https://github.com/alisson-anjos/ComfyUI-Ollama-Describer
415
+ ```
416
+
417
+ A useful node for this workflow is:
418
+
419
+ **Ollama Video Describer**
420
+
421
+ This can help generate structured descriptions from the composite guide video and make it easier to build the final prompt for V3.
422
+
423
+ ---
424
+
425
+ ## 🔹 V3 Trigger
426
+
427
+ Trigger remains:
428
+
429
+ ```txt
430
+ head_swap:
431
+ FACE:
432
+ ....
433
+
434
+ ACTION:
435
+ ....
436
+ ```
437
+
438
+ ---
439
+
440
+ # 🔴 Critical Success Factor (V2 / V3)
441
+
442
+ Mask and preparation quality still matter enormously.
443
+
444
+ Even with improved conditioning, final quality depends on:
445
+
446
+ * Proper face coverage
447
+ * Clean compositing
448
+ * Strong alignment
449
+ * Good source and reference quality
450
+
451
+ If any portion of the original guide identity remains visible where it should not, the model may still reintroduce unwanted traits.
452
+
453
+ Take time to refine your inputs. Better preparation consistently produces better output than simply increasing LoRA strength.
454
+
455
+ ---
456
+
457
+ ## 🔧 Advanced Technique: Combine with LTX-2 Inpainting
458
+
459
+ Advanced users can experiment with combining this LoRA with the native **LTX-2 inpainting workflow**.
460
+
461
+ This can help:
462
+
463
+ * Refine problematic areas
464
+ * Correct small deformation zones
465
+ * Improve edge blending
466
+ * Recover detail in hair or jaw regions
467
+
468
+ When properly combined, inpainting can significantly enhance final output quality, especially in challenging frames.
469
+
470
+ ---
471
+
472
+ ## 🔹 Recommendation
473
+
474
+ I strongly recommend testing **both LoRAs** and comparing the final behavior.
475
+
476
+ Depending on the guide clip, framing, facial motion, and the kind of result you want, some users may prefer the look or motion style of one version over the other.
477
+
478
+ In general:
479
+
480
+ * **V2** may still be preferred for some first-frame-driven workflows
481
+ * **V3** is better when you want a stronger persistent identity signal, better consistency, and better facial/lip motion continuity
482
+
483
+ The best version will often depend on the shot and on personal preference.
484
+
485
+ ---
486
+
487
+ # 💙 Support
488
+
489
+ Maintaining R&D and renting Blackwell GPUs is expensive.
490
+
491
+ If this project helps you, consider supporting the development of:
492
+
493
+ * V3 improvements
494
+ * Advanced conditioning pipelines
495
+ * SAM 3 integration
496
+ * Full reference-photo-only workflows
497
+
498
+ Support here:
499
+
500
+ [https://buymeacoffee.com/nrdx](https://buymeacoffee.com/nrdx)
examples/1.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb5008a5750697b2ba6a105813455861c8f84949762a9a919c4f423b60bfc124
3
+ size 30056875
examples/2.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fd30caefa2bbefe1bdc8927855bf472dce67bdb83e25eaa46255f073f4f79885
3
+ size 31230709
examples/3.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:545e6962243768f4a448f92e8ae72037d3f36a5d26eb2684595e79b045dbe500
3
+ size 66240119
examples/4.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1166856da2aee972247473feca89a3d2f31a90d8bbeeb882de3c5664f07fa1e2
3
+ size 38867419
examples/5.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7437dfd7d03ec41af05d1fa71b2d3b1fe107f4fa87b8e38ebc66b573ec16c66d
3
+ size 38173397
examples/6.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:55b9eef31500f4a392e1f203f098e9408fe7e47887f6a39ebee03b722d44ab7b
3
+ size 1688996
examples/7.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7a00b82f46b13fcf1eaa632a922c1b221316b3593387eb151ffe10e9613fe86f
3
+ size 1750633
examples/8.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:adee6e86236ce5c5c2e1d3f12811488875b2b0232658cb7e1d5cd90ffc53f270
3
+ size 3308271
ltx-2.3/head_swap_v3_rank_64.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a25d4fd9622d9d81f897744d1d7d7b99a17812eeb41905e68aacf5c375892608
3
+ size 654443424
ltx-2.3/head_swap_v3_rank_adaptive_fro_098.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:78083ede0004c500ac0ae7b8fee43d28a1c3b503bac4a21edfeba7c60da8460d
3
+ size 1358465856
ltx-2/download-models-head-swap-ltx2-windows.ps1 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40f0dded61a782ccdc19c30d9ead9bef69a5a53a05a6fc596ee7e02efedbde97
3
+ size 4146
ltx-2/download-models-head-swap-ltx2.sh ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6427e10f9a4d9141c728ebef8e9c0908700bd94726be53effc85ecb99a78423b
3
+ size 5119
ltx-2/head_swap_v1_13500_first_frame.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:056373cf73418dac449fecf34a5b749deeb802a1a4a4a9fc1677cd46c2d48864
3
+ size 1308756368
ltx-2/head_swap_v1_8750_first_and_last_frame.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48d16179c82629385fb5a812ed33182952e7755a60464ffe645d3417f5d48a71
3
+ size 1308756368
ltx-2/head_swap_v2_multimodes.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f459e03568447dcc5d6ea7c02466fece5eee3409bb23f07dfc2ecab24ac7a2fa
3
+ size 1316096704
workflows/workflow_ltx2_head_swap_drag_and_drop.json ADDED
The diff for this file is too large to render. See raw diff
 
workflows/workflow_ltx2_head_swap_drag_and_drop_v1.1.json ADDED
The diff for this file is too large to render. See raw diff
 
workflows/workflow_ltx2_head_swap_drag_and_drop_v2.0.json ADDED
The diff for this file is too large to render. See raw diff
 
workflows/workflow_ltx2_head_swap_drag_and_drop_v3.0.json ADDED
The diff for this file is too large to render. See raw diff