File size: 18,461 Bytes
09c86c7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
---
license: other
license_name: ltx-2-community-license-agreement
tags:
- ltx-2
- ic-lora
- head-swap
- video-to-video
- image-to-video
- bfs
- lora
base_model:
- Lightricks/LTX-2
library_name: diffusers
pipeline_tag: image-to-video
---

## โš ๏ธ Ethical Use & Disclaimer

This model is a technical tool designed for **Digital Identity Research, Professional VFX Workflows, and Cinematic Prototyping**.

By downloading or using this LoRA, you acknowledge and agree to the following:

* **Intended Use:** Designed for filmmakers, VFX artists, and researchers exploring high-fidelity video identity transformation.
* **Consent & Rights:** You must possess explicit legal consent and all necessary rights from any individual whose likeness is being processed.
* **Legal Compliance:** You are fully responsible for complying with all local and international laws regarding synthetic media.
* **Liability Waiver:** This model is provided *"as is."* **As the creator (Alissonerdx), I assume no responsibility for misuse.** Any legal, ethical, or social consequences are solely the responsibility of the end user.

---

# ๐Ÿ“บ Video Examples

## V1 Examples

Generated using the **Frame 0 Anchoring Technique**.
All examples follow the guide video motion while preserving the identity provided in the first frame.

| Example 1                                                                                                                                  | Example 2                                                                                                                                  |
| ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
| <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/1.mp4" controls autoplay loop muted></video> | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/2.mp4" controls autoplay loop muted></video> |

| Example 3                                                                                                                                  | Example 4                                                                                                                                  |
| ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
| <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/3.mp4" controls autoplay loop muted></video> | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/4.mp4" controls autoplay loop muted></video> |

| Example 5                                                                                                                                  |
| ------------------------------------------------------------------------------------------------------------------------------------------ |
| <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/5.mp4" controls autoplay loop muted></video> |

## V3 Examples

If you want to see the full setup in practice, watch here:

[https://www.youtube.com/watch?v=HBp03iu7wLA](https://www.youtube.com/watch?v=HBp03iu7wLA)

The following examples demonstrate the new **persistent-template workflow** used in V3:

| Example 6                                                                                                                                  | Example 7                                                                                                                                  |
| ------------------------------------------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------ |
| <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/6.mp4" controls autoplay loop muted></video> | <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/7.mp4" controls autoplay loop muted></video> |

| Example 8                                                                                                                                  |
| ------------------------------------------------------------------------------------------------------------------------------------------ |
| <video src="https://huggingface.co/Alissonerdx/BFS-Best-Face-Swap-Video/resolve/main/examples/8.mp4" controls autoplay loop muted></video> |

The image references for the versions are stored under:

```txt
ltx-2.3/...
```

---

# ๐Ÿ›  Technical Background (V1)

To achieve this level of identity transfer, I **heavily modified the official LTX-2 training scripts**.

### Key Improvements

* **Novel Conditioning Injection:** Custom latent injection methods for reference identity stabilization.
* **Noise Distribution Overhaul:** Implemented a **custom High-Noise Power Law timestep distribution**, forcing the model to prioritize target identity reconstruction over guide-video context.
* **Training Compute:** 60+ hours of training on **NVIDIA RTX PRO 6000 Blackwell GPUs**, iterating through 300GB+ of experimental checkpoints.

---

# ๐Ÿ“Š Dataset Specifications

## V1 Dataset

* **300 high-quality head swap video pairs**
* Trained on **512x512 buckets**
* Primarily **landscape format**
* Optimized for **close-up framing**

Wide shots may reduce identity fidelity.

---

# ๐Ÿ’ก Inference Guide (V1)

## ๐Ÿ”ด CRITICAL โ€” Frame 0 Requirement

This version was trained to use **Frame 0 as the identity anchor**.

You must prepare the first frame correctly.

### Recommended Workflow

1. Perform a high-quality head swap on Frame 0.
2. Use that processed frame as conditioning input.
3. Run the full video generation.

For best results, prepare Frame 0 using my previous **BFS Image Models**.

---

## Optimization

### LoRA Strength

* **1.0** โ†’ Best motion fidelity
* **>1.0** โ†’ Stronger identity and hair capture, but may distort original motion

### Multi-Pass Workflows

You can experiment with multiple passes using different strengths.

### Prompting

Detailed prompts currently have **no effect**.

Trigger remains:

```txt
head swap
```

---

# โš ๏ธ Known Issues (V1 โ€“ Alpha)

* **Identity Leakage:** Hair from the guide video may reappear.
* **Hard Cuts:** Jump cuts can reset identity.
* **Portrait Format:** Performance is significantly better in landscape.

---

# ๐Ÿš€ Version 2 โ€“ Major Update

V2 introduces a **complete redesign of conditioning strategy and masking logic**, significantly improving identity robustness and reducing leakage.

---

## ๐Ÿ”น Multiple Conditioning Modes (Using First Frame)

V2 supports multiple identity injection approaches:

### 1๏ธโƒฃ Direct Photo Conditioning

Use a clean photo of the new face as reference input.

This method works and can produce strong results. However, because the model must internally reconcile lighting, perspective, depth, and occlusion differences, it may need to fight to correctly integrate the new identity into the guide video. In some cases, this can reduce stability or identity consistency.

### 2๏ธโƒฃ First-Frame Head Swap (Recommended)

Applying a proper head swap on Frame 0 still produces **extremely strong and reliable results**.

Because the first frame is already structurally correct โ€” pose, lighting, depth, and occlusions โ€” the model has significantly less work to do. Instead of forcing alignment from a static photo, it simply propagates and stabilizes the identity through time.

This approach generally:

* Produces higher identity fidelity
* Reduces deformation
* Minimizes integration artifacts
* Improves overall temporal stability

### 3๏ธโƒฃ Automatic Magazine-Style Overlay

The new face is automatically cut and positioned over the guide face using mask alignment.
This simulates a magazine-cutout-style overlay, but performed automatically based on mask positioning.

### 4๏ธโƒฃ Manual Overlay

Advanced users may manually composite the new face over Frame 0 before running inference.

---

## ๐Ÿ”น Facial Motion Behavior (Important Change)

Unlike V1:

**V2 does not follow the original guide faceโ€™s facial micro-movements.**

The guide face is fully masked to prevent identity leakage.

This makes masking quality critical.

### Mask Requirements

* The guide face must be completely covered.
* Mask color must be a **magenta tone**.
* Any visible guide identity may leak into the final output.

---

## ๐Ÿ”น Mask Types

Users may alternate between:

### โ–ช Square Masks

* More stable identity
* Better consistency
* Often produce stronger overall results
* May generate slightly oversized heads due to spatial padding

In most scenarios, square masks tend to perform better because they provide additional spatial context for the model to reconstruct structure and hair.

### โ–ช Tight / Adjusted Masks

* More natural head proportions
* May deform if guide head shape differs significantly
* Sensitive to long-hair mismatches

If the original guide has long hair and the new identity does not, deformation risk increases.

---

## ๐Ÿ”น Dataset & Training Improvements (V2)

* **800+ video pairs**
* Trained at **768 resolution**
* **768** is the recommended inference resolution
* Improved hair stability
* Reduced identity leakage compared to V1
* More robust identity transfer under motion

---

## ๐Ÿ”น First Pass vs Second Pass

You may:

* Run a single pass at 768 (recommended)
* Or run a downscaled first pass plus a second upscale pass

โš ๏ธ Important:

A second pass may alter identity from the first pass and reduce consistency in some cases.

---

## ๐Ÿ”น Trigger

Trigger remains:

```txt
head swap
```

---

# ๐Ÿš€ Version 3 โ€“ Persistent Template Workflow

V3 introduces a new **persistent-template conditioning workflow**.

Unlike previous versions, which relied primarily on the identity being established from **Frame 0 only**, V3 uses a **custom guide-video construction step** that keeps the new face visible throughout the entire guide sequence.

This results in a much stronger and more persistent identity signal during inference.

# ๐Ÿ™ Acknowledgements

Special thanks to **facy.ai** for sponsoring the GPU used to train this model.

If you want to check their platform, you can use my referral link:

[https://facy.ai/a/headswap](https://facy.ai/a/headswap)

---

## ๐Ÿ”น How V3 Works

V3 uses a custom node from **ComfyUI-BFSNodes** to prepare the guide video before inference.

Repository:

```txt
https://github.com/alisson-anjos/ComfyUI-BFSNodes
```

Workflow file:

```txt
workflows/workflow_ltx2_head_swap_drag_and_drop_v3.0
```

The guide-video preparation process works like this:

1. Start from the original guide video
2. Add a **vertical green chroma-key strip** on the side
3. Place the **reference face image** inside that strip
4. Apply this composition to **every frame** of the original video
5. Use this new composite video as the actual inference guide

This means the new identity remains **fully visible during all frames** of the guide video, instead of appearing only in Frame 0 like in previous versions.

That is the main reason V3 can achieve better consistency than earlier versions.

---

## ๐Ÿ”น Why V3 Is Different

Because the identity reference stays visible during the full guide sequence, V3 gives the model a much more stable conditioning signal across time.

In practice, this can improve:

* Identity consistency
* Temporal stability
* Resistance to identity drift
* Facial motion continuity
* Lip sync behavior
* Expressive facial movement preservation

This version is especially useful for shots where the face remains visible for longer periods, or where dialogue, mouth movement, and facial acting matter more.

V3 is not just a refinement of the first-frame method. It changes the conditioning logic by giving the model access to a persistent identity template across the entire inference sequence.

---

## ๐Ÿ”น Final Output Behavior

Even though the guide video used during inference contains the **vertical chroma-key side strip**, the **final generated result does not include that strip**.

The generated video is returned in the **original resolution and framing** of the source guide video.

So in practice:

* The green side strip exists only in the internal guide/template video
* It is used only to improve inference conditioning
* It does not appear in the final output

---

## ๐Ÿ”น Prompting for V3

For V3, users can also pass the composite guide video into a vision-capable model to extract a structured prompt.

This is useful because the composite video contains two different information sources:

* the **reference identity** inside the side strip
* the **performance and scene information** in the main video area

This helps keep identity and action description separated more cleanly.

### Recommended Prompt Template

```txt
Analyze this composite video.

The video contains:
1. a side chroma-key panel with a reference face image
2. a main performance video showing the body, clothing, movement, hand actions, objects, framing, and environment

Your task is to extract:
- the target face identity from the side panel
- the performance/action from the main video

Critical rules:
- The side-panel face is the only valid source for identity traits and head-level accessories.
- Ignore the visible face and head appearance in the main video completely.
- Do not describe any face, hair, hairstyle, hair color, eye color, makeup, facial features, facial expression, attractiveness, headwear, hood, hat, or accessories from the main video.
- In the ACTION section, describe the performer only as "a person" and focus only on body movement, clothing, hand actions, objects, framing, and environment.
- Do not mention the chroma panel, green background, split layout, or editing structure.
- Be factual and non-creative.
- Do not guess uncertain details. If a detail is not clearly visible, omit it.

Return exactly in this format:
head_swap:

FACE:
A brief but detailed objective identity description from the side-panel face only. Include, when clearly visible: apparent gender, apparent ethnicity, skin tone or complexion, approximate age range, head shape, hair or baldness pattern, hair color, eye color, facial hair, visible skin details, headwear or head covering, visible facial accessories, and any especially distinctive facial trait. Prioritize the eyes when they are a strong defining feature.

ACTION:
A concise performance description from the main video. Include only: visible clothing, body position, movement, hand actions, objects being shown or handled, camera-facing behavior, framing, and environment. Do not include any face or head appearance from the main video.

Good example:
FACE:
Female, fair skin, approximately 20-30 years old, oval head shape, long wavy vivid blue-violet hair, bright golden-amber eyes with dark defined pupils, no facial hair, smooth skin, and pink flower hair accessories as a distinctive head adornment.

ACTION:
A person in a dark top faces the camera indoors, holds a package of false eyelashes close to the lens, peels one lash from the backing, brings it near the eye area, and examines it while making small hand movements.

Bad example:
ACTION:
A person with long curly blonde braids holds a pair of false eyelashes...
```

### How to Use

1. Generate the V3 composite guide video using the node
2. Pass that composite video into a vision-capable model
3. Extract the structured **FACE** and **ACTION** prompt
4. Use that output as the base prompt for the V3 workflow

---

## ๐Ÿ”น Captions / Descriptions for V3

If you want automatic captions or prompt extraction from video, you can also use my Ollama nodes.

Repository:

```txt
https://github.com/alisson-anjos/ComfyUI-Ollama-Describer
```

A useful node for this workflow is:

**Ollama Video Describer**

This can help generate structured descriptions from the composite guide video and make it easier to build the final prompt for V3.

---

## ๐Ÿ”น V3 Trigger

Trigger remains:

```txt
head_swap:
FACE:
....

ACTION:
....
```

---

# ๐Ÿ”ด Critical Success Factor (V2 / V3)

Mask and preparation quality still matter enormously.

Even with improved conditioning, final quality depends on:

* Proper face coverage
* Clean compositing
* Strong alignment
* Good source and reference quality

If any portion of the original guide identity remains visible where it should not, the model may still reintroduce unwanted traits.

Take time to refine your inputs. Better preparation consistently produces better output than simply increasing LoRA strength.

---

## ๐Ÿ”ง Advanced Technique: Combine with LTX-2 Inpainting

Advanced users can experiment with combining this LoRA with the native **LTX-2 inpainting workflow**.

This can help:

* Refine problematic areas
* Correct small deformation zones
* Improve edge blending
* Recover detail in hair or jaw regions

When properly combined, inpainting can significantly enhance final output quality, especially in challenging frames.

---

## ๐Ÿ”น Recommendation

I strongly recommend testing **both LoRAs** and comparing the final behavior.

Depending on the guide clip, framing, facial motion, and the kind of result you want, some users may prefer the look or motion style of one version over the other.

In general:

* **V2** may still be preferred for some first-frame-driven workflows
* **V3** is better when you want a stronger persistent identity signal, better consistency, and better facial/lip motion continuity

The best version will often depend on the shot and on personal preference.

---

# ๐Ÿ’™ Support

Maintaining R&D and renting Blackwell GPUs is expensive.

If this project helps you, consider supporting the development of:

* V3 improvements
* Advanced conditioning pipelines
* SAM 3 integration
* Full reference-photo-only workflows

Support here:

[https://buymeacoffee.com/nrdx](https://buymeacoffee.com/nrdx)