Image-to-Video
Diffusers
Safetensors
LTX2Pipeline
text-to-video
video-to-video
image-text-to-video
audio-to-video
text-to-audio
video-to-audio
audio-to-audio
text-to-audio-video
image-to-audio-video
image-text-to-audio-video
ltx-2
ltx-2-3
ltx-video
ltxv
lightricks
Instructions to use diffusers/LTX-2.3-Distilled-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use diffusers/LTX-2.3-Distilled-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("diffusers/LTX-2.3-Distilled-Diffusers", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
add README
Browse files
README.md
ADDED
|
@@ -0,0 +1,166 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
library_name: diffusers
|
| 3 |
+
pipeline_tag: text-to-video
|
| 4 |
+
base_model: Lightricks/LTX-2.3
|
| 5 |
+
tags:
|
| 6 |
+
- video-generation
|
| 7 |
+
- text-to-video
|
| 8 |
+
- ltx
|
| 9 |
+
- ltx-2
|
| 10 |
+
- distilled
|
| 11 |
+
license: other
|
| 12 |
+
license_name: ltx-video-2-open-source-license
|
| 13 |
+
license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# LTX-2.3 Distilled (Diffusers)
|
| 17 |
+
|
| 18 |
+
Diffusers-format weights for the distilled LTX-2.3 model from [Lightricks/LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3). Runs in **8 steps with CFG = 1**, trading some flexibility for substantially faster inference.
|
| 19 |
+
|
| 20 |
+
The non-distilled base model is at [`diffusers/LTX-2.3-Diffusers`](https://huggingface.co/diffusers/LTX-2.3-Diffusers).
|
| 21 |
+
|
| 22 |
+
## Usage
|
| 23 |
+
|
| 24 |
+
Requires a recent build of `diffusers` with LTX-2 support:
|
| 25 |
+
|
| 26 |
+
```bash
|
| 27 |
+
pip install -U git+https://github.com/huggingface/diffusers
|
| 28 |
+
```
|
| 29 |
+
|
| 30 |
+
The distilled checkpoint uses a fixed sigma schedule. Always pass `sigmas=DISTILLED_SIGMA_VALUES`, `num_inference_steps=8`, and `guidance_scale=1.0`.
|
| 31 |
+
|
| 32 |
+
### Text-to-video + audio
|
| 33 |
+
|
| 34 |
+
```python
|
| 35 |
+
import torch
|
| 36 |
+
from diffusers import LTX2Pipeline
|
| 37 |
+
from diffusers.pipelines.ltx2.export_utils import encode_video
|
| 38 |
+
from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES
|
| 39 |
+
|
| 40 |
+
pipe = LTX2Pipeline.from_pretrained(
|
| 41 |
+
"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
|
| 42 |
+
)
|
| 43 |
+
pipe.enable_model_cpu_offload()
|
| 44 |
+
|
| 45 |
+
prompt = "A flowing river in a forest at golden hour, gentle wind in the leaves."
|
| 46 |
+
frame_rate = 24.0
|
| 47 |
+
|
| 48 |
+
video, audio = pipe(
|
| 49 |
+
prompt=prompt,
|
| 50 |
+
negative_prompt=DEFAULT_NEGATIVE_PROMPT,
|
| 51 |
+
width=768,
|
| 52 |
+
height=512,
|
| 53 |
+
num_frames=121,
|
| 54 |
+
frame_rate=frame_rate,
|
| 55 |
+
num_inference_steps=8,
|
| 56 |
+
sigmas=DISTILLED_SIGMA_VALUES,
|
| 57 |
+
guidance_scale=1.0,
|
| 58 |
+
output_type="np",
|
| 59 |
+
return_dict=False,
|
| 60 |
+
)
|
| 61 |
+
|
| 62 |
+
encode_video(
|
| 63 |
+
video[0],
|
| 64 |
+
fps=frame_rate,
|
| 65 |
+
audio=audio[0].float().cpu(),
|
| 66 |
+
audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
|
| 67 |
+
output_path="ltx2_distilled_t2v.mp4",
|
| 68 |
+
)
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
### First-last-frame-to-video (FLF2V)
|
| 72 |
+
|
| 73 |
+
```python
|
| 74 |
+
import torch
|
| 75 |
+
from diffusers import LTX2ConditionPipeline
|
| 76 |
+
from diffusers.pipelines.ltx2.pipeline_ltx2_condition import LTX2VideoCondition
|
| 77 |
+
from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES
|
| 78 |
+
from diffusers.utils import load_image
|
| 79 |
+
|
| 80 |
+
pipe = LTX2ConditionPipeline.from_pretrained(
|
| 81 |
+
"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
|
| 82 |
+
)
|
| 83 |
+
pipe.enable_model_cpu_offload()
|
| 84 |
+
|
| 85 |
+
first_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
|
| 86 |
+
last_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
|
| 87 |
+
|
| 88 |
+
conditions = [
|
| 89 |
+
LTX2VideoCondition(frames=first_image, index=0, strength=1.0),
|
| 90 |
+
LTX2VideoCondition(frames=last_image, index=-1, strength=1.0),
|
| 91 |
+
]
|
| 92 |
+
|
| 93 |
+
prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings."
|
| 94 |
+
frame_rate = 24.0
|
| 95 |
+
|
| 96 |
+
video = pipe(
|
| 97 |
+
conditions=conditions,
|
| 98 |
+
prompt=prompt,
|
| 99 |
+
negative_prompt=DEFAULT_NEGATIVE_PROMPT,
|
| 100 |
+
width=768,
|
| 101 |
+
height=512,
|
| 102 |
+
num_frames=121,
|
| 103 |
+
frame_rate=frame_rate,
|
| 104 |
+
num_inference_steps=8,
|
| 105 |
+
sigmas=DISTILLED_SIGMA_VALUES,
|
| 106 |
+
guidance_scale=1.0,
|
| 107 |
+
output_type="np",
|
| 108 |
+
return_dict=False,
|
| 109 |
+
)
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
### HDR generation (IC-LoRA)
|
| 113 |
+
|
| 114 |
+
```python
|
| 115 |
+
import torch
|
| 116 |
+
from safetensors import safe_open
|
| 117 |
+
from diffusers import LTX2HDRPipeline
|
| 118 |
+
from diffusers.pipelines.ltx2.export_utils import encode_hdr_tensor_to_mp4
|
| 119 |
+
from diffusers.pipelines.ltx2.pipeline_ltx2_hdr_lora import LTX2HDRReferenceCondition
|
| 120 |
+
from diffusers.pipelines.ltx2.utils import DISTILLED_SIGMA_VALUES
|
| 121 |
+
from diffusers.utils import load_video
|
| 122 |
+
|
| 123 |
+
pipe = LTX2HDRPipeline.from_pretrained(
|
| 124 |
+
"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
|
| 125 |
+
)
|
| 126 |
+
pipe.enable_model_cpu_offload()
|
| 127 |
+
pipe.load_lora_weights(
|
| 128 |
+
"Lightricks/LTX-2.3-22b-IC-LoRA-HDR",
|
| 129 |
+
adapter_name="hdr_lora",
|
| 130 |
+
weight_name="ltx-2.3-22b-ic-lora-hdr-0.9.safetensors",
|
| 131 |
+
)
|
| 132 |
+
pipe.set_adapters("hdr_lora", 1.0)
|
| 133 |
+
|
| 134 |
+
reference_video = load_video("input.mp4")
|
| 135 |
+
ref_cond = LTX2HDRReferenceCondition(frames=reference_video, strength=1.0)
|
| 136 |
+
|
| 137 |
+
with safe_open("ltx-2.3-22b-ic-lora-hdr-scene-emb.safetensors", framework="pt", device="cuda") as f:
|
| 138 |
+
connector_video_embeds = f.get_tensor("video_context")
|
| 139 |
+
connector_audio_embeds = f.get_tensor("audio_context")
|
| 140 |
+
|
| 141 |
+
hdr_video = pipe(
|
| 142 |
+
reference_conditions=[ref_cond],
|
| 143 |
+
connector_video_embeds=connector_video_embeds,
|
| 144 |
+
connector_audio_embeds=connector_audio_embeds,
|
| 145 |
+
width=768,
|
| 146 |
+
height=512,
|
| 147 |
+
num_frames=121,
|
| 148 |
+
frame_rate=24.0,
|
| 149 |
+
num_inference_steps=8,
|
| 150 |
+
sigmas=DISTILLED_SIGMA_VALUES,
|
| 151 |
+
guidance_scale=1.0,
|
| 152 |
+
output_type="pt",
|
| 153 |
+
return_dict=False,
|
| 154 |
+
)[0]
|
| 155 |
+
|
| 156 |
+
encode_hdr_tensor_to_mp4(hdr_video[0], output_mp4="ltx2_hdr.mp4", frame_rate=24.0)
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
## Notes
|
| 160 |
+
|
| 161 |
+
- `width` and `height` must be divisible by 32; `num_frames` must equal `8k + 1`.
|
| 162 |
+
- See the [Diffusers LTX-2 docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2) for multimodal guidance, prompt enhancement, and the upscaling/refinement pipeline.
|
| 163 |
+
|
| 164 |
+
## License
|
| 165 |
+
|
| 166 |
+
These weights are released under the [LTX Video 2 Open Source License](https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE).
|