linoyts HF Staff commited on
Commit
0340278
·
verified ·
1 Parent(s): f90a633

[README enhancement] add best practice inference example with diffusers

Browse files

this PR adds a 2-stage inference example using diffusers (for best quality outputs), and links to the docs for more examples.

Files changed (1) hide show
  1. README.md +90 -1
README.md CHANGED
@@ -96,7 +96,96 @@ To use our model, please follow the instructions in our [ltx-pipelines](https://
96
 
97
  ## Diffusers 🧨
98
 
99
- LTX-2 is supported in the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) for image-to-video generation.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
100
 
101
  ## General tips:
102
  * Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.
 
96
 
97
  ## Diffusers 🧨
98
 
99
+ LTX-2 is supported in the [Diffusers Python library](https://huggingface.co/docs/diffusers/main/en/index) for text & image-to-video generation.
100
+ Read more on LTX-2 with diffusers [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2#diffusers.LTX2Pipeline.__call__.example).
101
+
102
+ ### Use with diffusers
103
+ To achieve production quality generation, it's recommended to use the two-stage generation pipeline.
104
+ Example for 2-stage inference of text-to-video:
105
+ ```python
106
+ import torch
107
+ from diffusers import FlowMatchEulerDiscreteScheduler
108
+ from diffusers.pipelines.ltx2 import LTX2Pipeline, LTX2LatentUpsamplePipeline
109
+ from diffusers.pipelines.ltx2.latent_upsampler import LTX2LatentUpsamplerModel
110
+ from diffusers.pipelines.ltx2.utils import STAGE_2_DISTILLED_SIGMA_VALUES
111
+ from diffusers.pipelines.ltx2.export_utils import encode_video
112
+
113
+ device = "cuda:0"
114
+ width = 768
115
+ height = 512
116
+
117
+ pipe = LTX2Pipeline.from_pretrained(
118
+ "Lightricks/LTX-2", torch_dtype=torch.bfloat16
119
+ )
120
+ pipe.enable_sequential_cpu_offload(device=device)
121
+
122
+ prompt = "A beautiful sunset over the ocean"
123
+ negative_prompt = "shaky, glitchy, low quality, worst quality, deformed, distorted, disfigured, motion smear, motion artifacts, fused fingers, bad anatomy, weird hand, ugly, transition, static."
124
+
125
+ # Stage 1 default (non-distilled) inference
126
+ frame_rate = 24.0
127
+ video_latent, audio_latent = pipe(
128
+ prompt=prompt,
129
+ negative_prompt=negative_prompt,
130
+ width=width,
131
+ height=height,
132
+ num_frames=121,
133
+ frame_rate=frame_rate,
134
+ num_inference_steps=40,
135
+ sigmas=None,
136
+ guidance_scale=4.0,
137
+ output_type="latent",
138
+ return_dict=False,
139
+ )
140
+
141
+ latent_upsampler = LTX2LatentUpsamplerModel.from_pretrained(
142
+ "Lightricks/LTX-2",
143
+ subfolder="latent_upsampler",
144
+ torch_dtype=torch.bfloat16,
145
+ )
146
+ upsample_pipe = LTX2LatentUpsamplePipeline(vae=pipe.vae, latent_upsampler=latent_upsampler)
147
+ upsample_pipe.enable_model_cpu_offload(device=device)
148
+ upscaled_video_latent = upsample_pipe(
149
+ latents=video_latent,
150
+ output_type="latent",
151
+ return_dict=False,
152
+ )[0]
153
+
154
+ # Load Stage 2 distilled LoRA
155
+ pipe.load_lora_weights(
156
+ "Lightricks/LTX-2", adapter_name="stage_2_distilled", weight_name="ltx-2-19b-distilled-lora-384.safetensors"
157
+ )
158
+ pipe.set_adapters("stage_2_distilled", 1.0)
159
+ # VAE tiling is usually necessary to avoid OOM error when VAE decoding
160
+ pipe.vae.enable_tiling()
161
+ # Change scheduler to use Stage 2 distilled sigmas as is
162
+ new_scheduler = FlowMatchEulerDiscreteScheduler.from_config(
163
+ pipe.scheduler.config, use_dynamic_shifting=False, shift_terminal=None
164
+ )
165
+ pipe.scheduler = new_scheduler
166
+ # Stage 2 inference with distilled LoRA and sigmas
167
+ video, audio = pipe(
168
+ latents=upscaled_video_latent,
169
+ audio_latents=audio_latent,
170
+ prompt=prompt,
171
+ negative_prompt=negative_prompt,
172
+ num_inference_steps=3,
173
+ noise_scale=STAGE_2_DISTILLED_SIGMA_VALUES[0], # renoise with first sigma value https://github.com/Lightricks/LTX-2/blob/main/packages/ltx-pipelines/src/ltx_pipelines/ti2vid_two_stages.py#L218
174
+ sigmas=STAGE_2_DISTILLED_SIGMA_VALUES,
175
+ guidance_scale=1.0,
176
+ output_type="np",
177
+ return_dict=False,
178
+ )
179
+
180
+ encode_video(
181
+ video[0],
182
+ fps=frame_rate,
183
+ audio=audio[0].float().cpu(),
184
+ audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
185
+ output_path="ltx2_lora_distilled_sample.mp4",
186
+ )
187
+ ```
188
+ For more inference examples, including generation with the distilled checkpoint, visit [here](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2#diffusers.LTX2Pipeline.__call__.example).
189
 
190
  ## General tips:
191
  * Width & height settings must be divisible by 32. Frame count must be divisible by 8 + 1.