linoyts HF Staff commited on
Commit
58eaf77
·
1 Parent(s): 7b308f3
Files changed (1) hide show
  1. README.md +158 -0
README.md ADDED
@@ -0,0 +1,158 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: diffusers
3
+ pipeline_tag: text-to-video
4
+ base_model: Lightricks/LTX-2.3
5
+ tags:
6
+ - video-generation
7
+ - text-to-video
8
+ - ltx
9
+ - ltx-2
10
+ license: other
11
+ license_name: ltx-video-2-open-source-license
12
+ license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE
13
+ ---
14
+
15
+ # LTX-2.3 (Diffusers)
16
+
17
+ Diffusers-format weights for [Lightricks/LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) — a DiT-based foundation model that jointly generates synchronized video and audio.
18
+
19
+ A distilled variant (8 steps, CFG=1) is available at [`diffusers/LTX-2.3-Distilled-Diffusers`](https://huggingface.co/diffusers/LTX-2.3-Distilled-Diffusers).
20
+
21
+ ## Usage
22
+
23
+ Requires a recent build of `diffusers` with LTX-2 support:
24
+
25
+ ```bash
26
+ pip install -U git+https://github.com/huggingface/diffusers
27
+ ```
28
+
29
+ ### Text-to-video + audio
30
+
31
+ ```python
32
+ import torch
33
+ from diffusers import LTX2Pipeline
34
+ from diffusers.pipelines.ltx2.export_utils import encode_video
35
+ from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
36
+
37
+ pipe = LTX2Pipeline.from_pretrained(
38
+ "diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
39
+ )
40
+ pipe.enable_model_cpu_offload()
41
+
42
+ prompt = "A flowing river in a forest at golden hour, gentle wind in the leaves."
43
+ frame_rate = 24.0
44
+
45
+ video, audio = pipe(
46
+ prompt=prompt,
47
+ negative_prompt=DEFAULT_NEGATIVE_PROMPT,
48
+ width=768,
49
+ height=512,
50
+ num_frames=121,
51
+ frame_rate=frame_rate,
52
+ num_inference_steps=30,
53
+ guidance_scale=3.0,
54
+ output_type="np",
55
+ return_dict=False,
56
+ )
57
+
58
+ encode_video(
59
+ video[0],
60
+ fps=frame_rate,
61
+ audio=audio[0].float().cpu(),
62
+ audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
63
+ output_path="ltx2_t2v.mp4",
64
+ )
65
+ ```
66
+
67
+ ### First-last-frame-to-video (FLF2V)
68
+
69
+ ```python
70
+ import torch
71
+ from diffusers import LTX2ConditionPipeline
72
+ from diffusers.pipelines.ltx2.pipeline_ltx2_condition import LTX2VideoCondition
73
+ from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
74
+ from diffusers.utils import load_image
75
+
76
+ pipe = LTX2ConditionPipeline.from_pretrained(
77
+ "diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
78
+ )
79
+ pipe.enable_model_cpu_offload()
80
+
81
+ first_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
82
+ last_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
83
+
84
+ conditions = [
85
+ LTX2VideoCondition(frames=first_image, index=0, strength=1.0),
86
+ LTX2VideoCondition(frames=last_image, index=-1, strength=1.0),
87
+ ]
88
+
89
+ prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings."
90
+ frame_rate = 24.0
91
+
92
+ video = pipe(
93
+ conditions=conditions,
94
+ prompt=prompt,
95
+ negative_prompt=DEFAULT_NEGATIVE_PROMPT,
96
+ width=768,
97
+ height=512,
98
+ num_frames=121,
99
+ frame_rate=frame_rate,
100
+ num_inference_steps=40,
101
+ guidance_scale=4.0,
102
+ output_type="np",
103
+ return_dict=False,
104
+ )
105
+ ```
106
+
107
+ ### IC-LoRA (camera control)
108
+
109
+ ```python
110
+ import torch
111
+ from diffusers import LTX2InContextPipeline
112
+ from diffusers.pipelines.ltx2.export_utils import encode_video
113
+ from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
114
+
115
+ pipe = LTX2InContextPipeline.from_pretrained(
116
+ "diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
117
+ )
118
+ pipe.enable_model_cpu_offload()
119
+ pipe.load_lora_weights(
120
+ "Lightricks/LTX-2-19b-LoRA-Camera-Control-Dolly-In",
121
+ adapter_name="ic_lora",
122
+ weight_name="ltx-2-19b-lora-camera-control-dolly-in.safetensors",
123
+ )
124
+ pipe.set_adapters("ic_lora", 1.0)
125
+
126
+ prompt = "A flowing river in a forest"
127
+ frame_rate = 24.0
128
+
129
+ video, audio = pipe(
130
+ prompt=prompt,
131
+ negative_prompt=DEFAULT_NEGATIVE_PROMPT,
132
+ width=768,
133
+ height=512,
134
+ num_frames=121,
135
+ frame_rate=frame_rate,
136
+ num_inference_steps=30,
137
+ guidance_scale=3.0,
138
+ output_type="np",
139
+ return_dict=False,
140
+ )
141
+
142
+ encode_video(
143
+ video[0],
144
+ fps=frame_rate,
145
+ audio=audio[0].float().cpu(),
146
+ audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
147
+ output_path="ltx2_ic_lora.mp4",
148
+ )
149
+ ```
150
+
151
+ ## Notes
152
+
153
+ - `width` and `height` must be divisible by 32; `num_frames` must equal `8k + 1`.
154
+ - See the [Diffusers LTX-2 docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2) for multimodal guidance, prompt enhancement, and the upscaling/refinement pipeline.
155
+
156
+ ## License
157
+
158
+ These weights are released under the [LTX Video 2 Open Source License](https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE).