linoyts HF Staff commited on
Commit
893527e
·
1 Parent(s): 1cda4f1
Files changed (1) hide show
  1. README.md +166 -0
README.md ADDED
@@ -0,0 +1,166 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: diffusers
3
+ pipeline_tag: text-to-video
4
+ base_model: Lightricks/LTX-2.3
5
+ tags:
6
+ - video-generation
7
+ - text-to-video
8
+ - ltx
9
+ - ltx-2
10
+ - distilled
11
+ license: other
12
+ license_name: ltx-video-2-open-source-license
13
+ license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE
14
+ ---
15
+
16
+ # LTX-2.3 Distilled (Diffusers)
17
+
18
+ Diffusers-format weights for the distilled LTX-2.3 model from [Lightricks/LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3). Runs in **8 steps with CFG = 1**, trading some flexibility for substantially faster inference.
19
+
20
+ The non-distilled base model is at [`diffusers/LTX-2.3-Diffusers`](https://huggingface.co/diffusers/LTX-2.3-Diffusers).
21
+
22
+ ## Usage
23
+
24
+ Requires a recent build of `diffusers` with LTX-2 support:
25
+
26
+ ```bash
27
+ pip install -U git+https://github.com/huggingface/diffusers
28
+ ```
29
+
30
+ The distilled checkpoint uses a fixed sigma schedule. Always pass `sigmas=DISTILLED_SIGMA_VALUES`, `num_inference_steps=8`, and `guidance_scale=1.0`.
31
+
32
+ ### Text-to-video + audio
33
+
34
+ ```python
35
+ import torch
36
+ from diffusers import LTX2Pipeline
37
+ from diffusers.pipelines.ltx2.export_utils import encode_video
38
+ from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES
39
+
40
+ pipe = LTX2Pipeline.from_pretrained(
41
+ "diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
42
+ )
43
+ pipe.enable_model_cpu_offload()
44
+
45
+ prompt = "A flowing river in a forest at golden hour, gentle wind in the leaves."
46
+ frame_rate = 24.0
47
+
48
+ video, audio = pipe(
49
+ prompt=prompt,
50
+ negative_prompt=DEFAULT_NEGATIVE_PROMPT,
51
+ width=768,
52
+ height=512,
53
+ num_frames=121,
54
+ frame_rate=frame_rate,
55
+ num_inference_steps=8,
56
+ sigmas=DISTILLED_SIGMA_VALUES,
57
+ guidance_scale=1.0,
58
+ output_type="np",
59
+ return_dict=False,
60
+ )
61
+
62
+ encode_video(
63
+ video[0],
64
+ fps=frame_rate,
65
+ audio=audio[0].float().cpu(),
66
+ audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
67
+ output_path="ltx2_distilled_t2v.mp4",
68
+ )
69
+ ```
70
+
71
+ ### First-last-frame-to-video (FLF2V)
72
+
73
+ ```python
74
+ import torch
75
+ from diffusers import LTX2ConditionPipeline
76
+ from diffusers.pipelines.ltx2.pipeline_ltx2_condition import LTX2VideoCondition
77
+ from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES
78
+ from diffusers.utils import load_image
79
+
80
+ pipe = LTX2ConditionPipeline.from_pretrained(
81
+ "diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
82
+ )
83
+ pipe.enable_model_cpu_offload()
84
+
85
+ first_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
86
+ last_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
87
+
88
+ conditions = [
89
+ LTX2VideoCondition(frames=first_image, index=0, strength=1.0),
90
+ LTX2VideoCondition(frames=last_image, index=-1, strength=1.0),
91
+ ]
92
+
93
+ prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings."
94
+ frame_rate = 24.0
95
+
96
+ video = pipe(
97
+ conditions=conditions,
98
+ prompt=prompt,
99
+ negative_prompt=DEFAULT_NEGATIVE_PROMPT,
100
+ width=768,
101
+ height=512,
102
+ num_frames=121,
103
+ frame_rate=frame_rate,
104
+ num_inference_steps=8,
105
+ sigmas=DISTILLED_SIGMA_VALUES,
106
+ guidance_scale=1.0,
107
+ output_type="np",
108
+ return_dict=False,
109
+ )
110
+ ```
111
+
112
+ ### HDR generation (IC-LoRA)
113
+
114
+ ```python
115
+ import torch
116
+ from safetensors import safe_open
117
+ from diffusers import LTX2HDRPipeline
118
+ from diffusers.pipelines.ltx2.export_utils import encode_hdr_tensor_to_mp4
119
+ from diffusers.pipelines.ltx2.pipeline_ltx2_hdr_lora import LTX2HDRReferenceCondition
120
+ from diffusers.pipelines.ltx2.utils import DISTILLED_SIGMA_VALUES
121
+ from diffusers.utils import load_video
122
+
123
+ pipe = LTX2HDRPipeline.from_pretrained(
124
+ "diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
125
+ )
126
+ pipe.enable_model_cpu_offload()
127
+ pipe.load_lora_weights(
128
+ "Lightricks/LTX-2.3-22b-IC-LoRA-HDR",
129
+ adapter_name="hdr_lora",
130
+ weight_name="ltx-2.3-22b-ic-lora-hdr-0.9.safetensors",
131
+ )
132
+ pipe.set_adapters("hdr_lora", 1.0)
133
+
134
+ reference_video = load_video("input.mp4")
135
+ ref_cond = LTX2HDRReferenceCondition(frames=reference_video, strength=1.0)
136
+
137
+ with safe_open("ltx-2.3-22b-ic-lora-hdr-scene-emb.safetensors", framework="pt", device="cuda") as f:
138
+ connector_video_embeds = f.get_tensor("video_context")
139
+ connector_audio_embeds = f.get_tensor("audio_context")
140
+
141
+ hdr_video = pipe(
142
+ reference_conditions=[ref_cond],
143
+ connector_video_embeds=connector_video_embeds,
144
+ connector_audio_embeds=connector_audio_embeds,
145
+ width=768,
146
+ height=512,
147
+ num_frames=121,
148
+ frame_rate=24.0,
149
+ num_inference_steps=8,
150
+ sigmas=DISTILLED_SIGMA_VALUES,
151
+ guidance_scale=1.0,
152
+ output_type="pt",
153
+ return_dict=False,
154
+ )[0]
155
+
156
+ encode_hdr_tensor_to_mp4(hdr_video[0], output_mp4="ltx2_hdr.mp4", frame_rate=24.0)
157
+ ```
158
+
159
+ ## Notes
160
+
161
+ - `width` and `height` must be divisible by 32; `num_frames` must equal `8k + 1`.
162
+ - See the [Diffusers LTX-2 docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2) for multimodal guidance, prompt enhancement, and the upscaling/refinement pipeline.
163
+
164
+ ## License
165
+
166
+ These weights are released under the [LTX Video 2 Open Source License](https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE).