linoyts HF Staff commited on
Commit
ec575d2
·
verified ·
1 Parent(s): eca89b4

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: diffusers
3
+ tags:
4
+ - ltx
5
+ - video-generation
6
+ - audio-to-video
7
+ - video-conditioning
8
+ license: apache-2.0
9
+ ---
10
+
11
+ # LTX-2 Audio-to-Video Pipeline with Video Conditioning
12
+
13
+ A custom diffusers pipeline for LTX-2 that extends audio-to-video generation with **video conditioning** support.
14
+
15
+ ## Features
16
+
17
+ - Audio-conditioned video generation (lip-sync)
18
+ - **Video conditioning** for motion/pose guidance
19
+ - Configurable conditioning strength and start frame
20
+ - Compatible with LTX-2 LoRAs (face-swap, camera control, etc.)
21
+
22
+ ## Installation
23
+
24
+ ```bash
25
+ pip install diffusers transformers torch torchaudio av
26
+ ```
27
+
28
+ ## Usage
29
+
30
+ ```python
31
+ import torch
32
+ from diffusers import DiffusionPipeline
33
+ from diffusers.utils import load_image
34
+
35
+ # Load pipeline with custom video conditioning support
36
+ pipe = DiffusionPipeline.from_pretrained(
37
+ "Lightricks/LTX-2",
38
+ custom_pipeline="linoyts/ltx2-audio-video-conditioning",
39
+ torch_dtype=torch.bfloat16
40
+ )
41
+ pipe.to("cuda")
42
+
43
+ # Optional: Load a LoRA (e.g., face-swap)
44
+ # pipe.load_lora_weights("Alissonerdx/BFS-Best-Face-Swap-Video",
45
+ # weight_name="ltx-2/head_swap_v1_13500_first_frame.safetensors")
46
+ # pipe.fuse_lora(lora_scale=1.1)
47
+
48
+ # Load inputs
49
+ image = load_image("input_face.png")
50
+
51
+ # Generate with video conditioning
52
+ video, audio = pipe(
53
+ image=image, # Frame 0 appearance
54
+ video="reference_motion.mp4", # Video for motion conditioning
55
+ video_conditioning_strength=1.0, # How strongly to follow motion (0-1)
56
+ video_conditioning_frame_idx=1, # Start video conditioning at frame 1
57
+ audio="audio.wav", # Audio for lip-sync
58
+ prompt="a person speaking naturally, smooth animation",
59
+ negative_prompt="low quality, blurry, distorted",
60
+ width=512,
61
+ height=768,
62
+ num_frames=121,
63
+ frame_rate=24.0,
64
+ num_inference_steps=40,
65
+ guidance_scale=4.0,
66
+ return_dict=False,
67
+ )
68
+ ```
69
+
70
+ ## Parameters
71
+
72
+ | Parameter | Type | Default | Description |
73
+ |-----------|------|---------|-------------|
74
+ | `image` | PIL.Image | None | Input image for frame 0 conditioning |
75
+ | `video` | str/List/Tensor | None | Reference video for motion conditioning |
76
+ | `video_conditioning_strength` | float | 1.0 | Strength of video conditioning (0.0-1.0) |
77
+ | `video_conditioning_frame_idx` | int | 1 | Frame index where video conditioning starts |
78
+ | `audio` | str/Tensor | None | Audio input for lip-sync |
79
+
80
+ ### Video Conditioning Frame Index
81
+
82
+ - `0`: Video conditioning replaces all frames
83
+ - `1` (default): Frame 0 = image, frames 1+ = video motion
84
+ - `N`: Frames 0 to N-1 = image/noise, frames N+ = video conditioning
85
+
86
+ ## Distilled Model (8-step)
87
+
88
+ For faster generation with the distilled model:
89
+
90
+ ```python
91
+ pipe = DiffusionPipeline.from_pretrained(
92
+ "rootonchair/LTX-2-19b-distilled",
93
+ custom_pipeline="linoyts/ltx2-audio-video-conditioning",
94
+ torch_dtype=torch.bfloat16
95
+ )
96
+ pipe.to("cuda")
97
+
98
+ DISTILLED_SIGMAS = [1.0, 0.99375, 0.9875, 0.98125, 0.975, 0.909375, 0.725, 0.421875]
99
+
100
+ video, audio = pipe(
101
+ image=image,
102
+ video="reference.mp4",
103
+ audio="audio.wav",
104
+ prompt="...",
105
+ num_inference_steps=8,
106
+ sigmas=DISTILLED_SIGMAS,
107
+ guidance_scale=1.0,
108
+ return_dict=False,
109
+ )
110
+ ```
111
+
112
+ ## License
113
+
114
+ Apache 2.0