Image-to-Video
Diffusers
Safetensors
LTX2Pipeline
text-to-video
video-to-video
image-text-to-video
audio-to-video
text-to-audio
video-to-audio
audio-to-audio
text-to-audio-video
image-to-audio-video
image-text-to-audio-video
ltx-2
ltx-2-3
ltx-video
ltxv
lightricks
Instructions to use diffusers/LTX-2.3-Distilled-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use diffusers/LTX-2.3-Distilled-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("diffusers/LTX-2.3-Distilled-Diffusers", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
File size: 5,431 Bytes
893527e 432e0d3 893527e 432e0d3 893527e 432e0d3 893527e 432e0d3 893527e 432e0d3 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 | ---
library_name: diffusers
pipeline_tag: image-to-video
base_model: Lightricks/LTX-2.3
tags:
- image-to-video
- text-to-video
- video-to-video
- image-text-to-video
- audio-to-video
- text-to-audio
- video-to-audio
- audio-to-audio
- text-to-audio-video
- image-to-audio-video
- image-text-to-audio-video
- ltx-2
- ltx-2-3
- ltx-video
- ltxv
- lightricks
license: other
license_name: ltx-video-2-open-source-license
license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE
---
# LTX-2.3 Distilled (Diffusers)
Diffusers-format weights for the distilled LTX-2.3 model from [Lightricks/LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3). Runs in **8 steps with CFG = 1**, trading some flexibility for substantially faster inference.
The non-distilled base model is at [`diffusers/LTX-2.3-Diffusers`](https://huggingface.co/diffusers/LTX-2.3-Diffusers).
## Usage
Requires a recent build of `diffusers` with LTX-2 support:
```bash
pip install -U git+https://github.com/huggingface/diffusers
```
The distilled checkpoint uses a fixed sigma schedule. Always pass `sigmas=DISTILLED_SIGMA_VALUES`, `num_inference_steps=8`, and `guidance_scale=1.0`.
### Text-to-video + audio
```python
import torch
from diffusers import LTX2Pipeline
from diffusers.pipelines.ltx2.export_utils import encode_video
from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES
pipe = LTX2Pipeline.from_pretrained(
"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
prompt = "A flowing river in a forest at golden hour, gentle wind in the leaves."
frame_rate = 24.0
video, audio = pipe(
prompt=prompt,
negative_prompt=DEFAULT_NEGATIVE_PROMPT,
width=768,
height=512,
num_frames=121,
frame_rate=frame_rate,
num_inference_steps=8,
sigmas=DISTILLED_SIGMA_VALUES,
guidance_scale=1.0,
output_type="np",
return_dict=False,
)
encode_video(
video[0],
fps=frame_rate,
audio=audio[0].float().cpu(),
audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
output_path="ltx2_distilled_t2v.mp4",
)
```
### First-last-frame-to-video (FLF2V)
```python
import torch
from diffusers import LTX2ConditionPipeline
from diffusers.pipelines.ltx2.pipeline_ltx2_condition import LTX2VideoCondition
from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES
from diffusers.utils import load_image
pipe = LTX2ConditionPipeline.from_pretrained(
"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
first_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
last_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
conditions = [
LTX2VideoCondition(frames=first_image, index=0, strength=1.0),
LTX2VideoCondition(frames=last_image, index=-1, strength=1.0),
]
prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings."
frame_rate = 24.0
video = pipe(
conditions=conditions,
prompt=prompt,
negative_prompt=DEFAULT_NEGATIVE_PROMPT,
width=768,
height=512,
num_frames=121,
frame_rate=frame_rate,
num_inference_steps=8,
sigmas=DISTILLED_SIGMA_VALUES,
guidance_scale=1.0,
output_type="np",
return_dict=False,
)
```
### HDR generation (IC-LoRA)
```python
import torch
from safetensors import safe_open
from diffusers import LTX2HDRPipeline
from diffusers.pipelines.ltx2.export_utils import encode_hdr_tensor_to_mp4
from diffusers.pipelines.ltx2.pipeline_ltx2_hdr_lora import LTX2HDRReferenceCondition
from diffusers.pipelines.ltx2.utils import DISTILLED_SIGMA_VALUES
from diffusers.utils import load_video
pipe = LTX2HDRPipeline.from_pretrained(
"diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.load_lora_weights(
"Lightricks/LTX-2.3-22b-IC-LoRA-HDR",
adapter_name="hdr_lora",
weight_name="ltx-2.3-22b-ic-lora-hdr-0.9.safetensors",
)
pipe.set_adapters("hdr_lora", 1.0)
reference_video = load_video("input.mp4")
ref_cond = LTX2HDRReferenceCondition(frames=reference_video, strength=1.0)
with safe_open("ltx-2.3-22b-ic-lora-hdr-scene-emb.safetensors", framework="pt", device="cuda") as f:
connector_video_embeds = f.get_tensor("video_context")
connector_audio_embeds = f.get_tensor("audio_context")
hdr_video = pipe(
reference_conditions=[ref_cond],
connector_video_embeds=connector_video_embeds,
connector_audio_embeds=connector_audio_embeds,
width=768,
height=512,
num_frames=121,
frame_rate=24.0,
num_inference_steps=8,
sigmas=DISTILLED_SIGMA_VALUES,
guidance_scale=1.0,
output_type="pt",
return_dict=False,
)[0]
encode_hdr_tensor_to_mp4(hdr_video[0], output_mp4="ltx2_hdr.mp4", frame_rate=24.0)
```
## Notes
- `width` and `height` must be divisible by 32; `num_frames` must equal `8k + 1`.
- See the [Diffusers LTX-2 docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2) for multimodal guidance, prompt enhancement, and the upscaling/refinement pipeline.
## License
These weights are released under the [LTX Video 2 Open Source License](https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE). |