LTX-2.3-Diffusers

+---
+library_name: diffusers
+pipeline_tag: text-to-video
+base_model: Lightricks/LTX-2.3
+tags:
+- video-generation
+- text-to-video
+- ltx
+- ltx-2
+license: other
+license_name: ltx-video-2-open-source-license
+license_link: https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE
+---
+# LTX-2.3 (Diffusers)
+Diffusers-format weights for [Lightricks/LTX-2.3](https://huggingface.co/Lightricks/LTX-2.3) — a DiT-based foundation model that jointly generates synchronized video and audio.
+A distilled variant (8 steps, CFG=1) is available at [`diffusers/LTX-2.3-Distilled-Diffusers`](https://huggingface.co/diffusers/LTX-2.3-Distilled-Diffusers).
+## Usage
+Requires a recent build of `diffusers` with LTX-2 support:
+```bash
+pip install -U git+https://github.com/huggingface/diffusers
+```
+### Text-to-video + audio
+```python
+import torch
+from diffusers import LTX2Pipeline
+from diffusers.pipelines.ltx2.export_utils import encode_video
+from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
+pipe = LTX2Pipeline.from_pretrained(
+    "diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
+)
+pipe.enable_model_cpu_offload()
+prompt = "A flowing river in a forest at golden hour, gentle wind in the leaves."
+frame_rate = 24.0
+video, audio = pipe(
+    prompt=prompt,
+    negative_prompt=DEFAULT_NEGATIVE_PROMPT,
+    width=768,
+    height=512,
+    num_frames=121,
+    frame_rate=frame_rate,
+    num_inference_steps=30,
+    guidance_scale=3.0,
+    output_type="np",
+    return_dict=False,
+)
+encode_video(
+    video[0],
+    fps=frame_rate,
+    audio=audio[0].float().cpu(),
+    audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
+    output_path="ltx2_t2v.mp4",
+)
+```
+### First-last-frame-to-video (FLF2V)
+```python
+import torch
+from diffusers import LTX2ConditionPipeline
+from diffusers.pipelines.ltx2.pipeline_ltx2_condition import LTX2VideoCondition
+from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
+from diffusers.utils import load_image
+pipe = LTX2ConditionPipeline.from_pretrained(
+    "diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
+)
+pipe.enable_model_cpu_offload()
+first_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
+last_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
+conditions = [
+    LTX2VideoCondition(frames=first_image, index=0, strength=1.0),
+    LTX2VideoCondition(frames=last_image, index=-1, strength=1.0),
+]
+prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings."
+frame_rate = 24.0
+video = pipe(
+    conditions=conditions,
+    prompt=prompt,
+    negative_prompt=DEFAULT_NEGATIVE_PROMPT,
+    width=768,
+    height=512,
+    num_frames=121,
+    frame_rate=frame_rate,
+    num_inference_steps=40,
+    guidance_scale=4.0,
+    output_type="np",
+    return_dict=False,
+)
+```
+### IC-LoRA (camera control)
+```python
+import torch
+from diffusers import LTX2InContextPipeline
+from diffusers.pipelines.ltx2.export_utils import encode_video
+from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
+pipe = LTX2InContextPipeline.from_pretrained(
+    "diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
+)
+pipe.enable_model_cpu_offload()
+pipe.load_lora_weights(
+    "Lightricks/LTX-2-19b-LoRA-Camera-Control-Dolly-In",
+    adapter_name="ic_lora",
+    weight_name="ltx-2-19b-lora-camera-control-dolly-in.safetensors",
+)
+pipe.set_adapters("ic_lora", 1.0)
+prompt = "A flowing river in a forest"
+frame_rate = 24.0
+video, audio = pipe(
+    prompt=prompt,
+    negative_prompt=DEFAULT_NEGATIVE_PROMPT,
+    width=768,
+    height=512,
+    num_frames=121,
+    frame_rate=frame_rate,
+    num_inference_steps=30,
+    guidance_scale=3.0,
+    output_type="np",
+    return_dict=False,
+)
+encode_video(
+    video[0],
+    fps=frame_rate,
+    audio=audio[0].float().cpu(),
+    audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
+    output_path="ltx2_ic_lora.mp4",
+)
+```
+## Notes
+- `width` and `height` must be divisible by 32; `num_frames` must equal `8k + 1`.
+- See the [Diffusers LTX-2 docs](https://huggingface.co/docs/diffusers/main/en/api/pipelines/ltx2) for multimodal guidance, prompt enhancement, and the upscaling/refinement pipeline.
+## License
+These weights are released under the [LTX Video 2 Open Source License](https://huggingface.co/Lightricks/LTX-2.3/blob/main/LICENSE).