Image-to-Video
Diffusers
Safetensors
LTX2Pipeline
text-to-video
video-to-video
image-text-to-video
audio-to-video
text-to-audio
video-to-audio
audio-to-audio
text-to-audio-video
image-to-audio-video
image-text-to-audio-video
ltx-2
ltx-2-3
ltx-video
ltxv
lightricks
Instructions to use diffusers/LTX-2.3-Distilled-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use diffusers/LTX-2.3-Distilled-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("diffusers/LTX-2.3-Distilled-Diffusers", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
| library_name: diffusers | |
| pipeline_tag: image-to-video | |
| base_model: Lightricks/LTX-2.3 | |
| tags: | |
| - image-to-video | |
| - text-to-video | |
| - video-to-video | |
| - image-text-to-video | |
| - audio-to-video | |
| - text-to-audio | |
| - video-to-audio | |
| - audio-to-audio | |
| - text-to-audio-video | |
| - image-to-audio-video | |
| - image-text-to-audio-video | |
| - ltx-2 | |
| - ltx-2-3 | |
| - ltx-video | |
| - ltxv | |
| - lightricks | |
| license: other | |
| license_name: ltx-video-2-open-source-license | |
| license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE | |
| # LTX-2.3 Distilled (Diffusers) | |
| Diffusers-format weights for the distilled LTX-2.3 model from [Lightricks/LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3). Runs in **8 steps with CFG = 1**, trading some flexibility for substantially faster inference. | |
| The non-distilled base model is at [`diffusers/LTX-2.3-Diffusers`](https://huggingface.co/diffusers/LTX-2.3-Diffusers). | |
| ## Usage | |
| Requires a recent build of `diffusers` with LTX-2 support: | |
| ```bash | |
| pip install -U git+https://github.com/huggingface/diffusers | |
| ``` | |
| The distilled checkpoint uses a fixed sigma schedule. Always pass `sigmas=DISTILLED_SIGMA_VALUES`, `num_inference_steps=8`, and `guidance_scale=1.0`. | |
| ### Text-to-video + audio | |
| ```python | |
| import torch | |
| from diffusers import LTX2Pipeline | |
| from diffusers.pipelines.ltx2.export_utils import encode_video | |
| from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES | |
| pipe = LTX2Pipeline.from_pretrained( | |
| "diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16 | |
| ) | |
| pipe.enable_model_cpu_offload() | |
| prompt = "A flowing river in a forest at golden hour, gentle wind in the leaves." | |
| frame_rate = 24.0 | |
| video, audio = pipe( | |
| prompt=prompt, | |
| negative_prompt=DEFAULT_NEGATIVE_PROMPT, | |
| width=768, | |
| height=512, | |
| num_frames=121, | |
| frame_rate=frame_rate, | |
| num_inference_steps=8, | |
| sigmas=DISTILLED_SIGMA_VALUES, | |
| guidance_scale=1.0, | |
| output_type="np", | |
| return_dict=False, | |
| ) | |
| encode_video( | |
| video[0], | |
| fps=frame_rate, | |
| audio=audio[0].float().cpu(), | |
| audio_sample_rate=pipe.vocoder.config.output_sampling_rate, | |
| output_path="ltx2_distilled_t2v.mp4", | |
| ) | |
| ``` | |
| ### First-last-frame-to-video (FLF2V) | |
| ```python | |
| import torch | |
| from diffusers import LTX2ConditionPipeline | |
| from diffusers.pipelines.ltx2.pipeline_ltx2_condition import LTX2VideoCondition | |
| from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT, DISTILLED_SIGMA_VALUES | |
| from diffusers.utils import load_image | |
| pipe = LTX2ConditionPipeline.from_pretrained( | |
| "diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16 | |
| ) | |
| pipe.enable_model_cpu_offload() | |
| first_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png") | |
| last_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png") | |
| conditions = [ | |
| LTX2VideoCondition(frames=first_image, index=0, strength=1.0), | |
| LTX2VideoCondition(frames=last_image, index=-1, strength=1.0), | |
| ] | |
| prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings." | |
| frame_rate = 24.0 | |
| video = pipe( | |
| conditions=conditions, | |
| prompt=prompt, | |
| negative_prompt=DEFAULT_NEGATIVE_PROMPT, | |
| width=768, | |
| height=512, | |
| num_frames=121, | |
| frame_rate=frame_rate, | |
| num_inference_steps=8, | |
| sigmas=DISTILLED_SIGMA_VALUES, | |
| guidance_scale=1.0, | |
| output_type="np", | |
| return_dict=False, | |
| ) | |
| ``` | |
| ### HDR generation (IC-LoRA) | |
| ```python | |
| import torch | |
| from safetensors import safe_open | |
| from diffusers import LTX2HDRPipeline | |
| from diffusers.pipelines.ltx2.export_utils import encode_hdr_tensor_to_mp4 | |
| from diffusers.pipelines.ltx2.pipeline_ltx2_hdr_lora import LTX2HDRReferenceCondition | |
| from diffusers.pipelines.ltx2.utils import DISTILLED_SIGMA_VALUES | |
| from diffusers.utils import load_video | |
| pipe = LTX2HDRPipeline.from_pretrained( | |
| "diffusers/LTX-2.3-Distilled-Diffusers", torch_dtype=torch.bfloat16 | |
| ) | |
| pipe.enable_model_cpu_offload() | |
| pipe.load_lora_weights( | |
| "Lightricks/LTX-2.3-22b-IC-LoRA-HDR", | |
| adapter_name="hdr_lora", | |
| weight_name="ltx-2.3-22b-ic-lora-hdr-0.9.safetensors", | |
| ) | |
| pipe.set_adapters("hdr_lora", 1.0) | |
| reference_video = load_video("input.mp4") | |
| ref_cond = LTX2HDRReferenceCondition(frames=reference_video, strength=1.0) | |
| with safe_open("ltx-2.3-22b-ic-lora-hdr-scene-emb.safetensors", framework="pt", device="cuda") as f: | |
| connector_video_embeds = f.get_tensor("video_context") | |
| connector_audio_embeds = f.get_tensor("audio_context") | |
| hdr_video = pipe( | |
| reference_conditions=[ref_cond], | |
| connector_video_embeds=connector_video_embeds, | |
| connector_audio_embeds=connector_audio_embeds, | |
| width=768, | |
| height=512, | |
| num_frames=121, | |
| frame_rate=24.0, | |
| num_inference_steps=8, | |
| sigmas=DISTILLED_SIGMA_VALUES, | |
| guidance_scale=1.0, | |
| output_type="pt", | |
| return_dict=False, | |
| )[0] | |
| encode_hdr_tensor_to_mp4(hdr_video[0], output_mp4="ltx2_hdr.mp4", frame_rate=24.0) | |
| ``` | |
| ## Notes | |
| - `width` and `height` must be divisible by 32; `num_frames` must equal `8k + 1`. | |
| - See the [Diffusers LTX-2 docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2) for multimodal guidance, prompt enhancement, and the upscaling/refinement pipeline. | |
| ## License | |
| These weights are released under the [LTX Video 2 Open Source License](https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE). |