Image-to-Video
Diffusers
Safetensors
LTX2Pipeline
text-to-video
video-to-video
image-text-to-video
audio-to-video
text-to-audio
video-to-audio
audio-to-audio
text-to-audio-video
image-to-audio-video
image-text-to-audio-video
ltx-2
ltx-2-3
ltx-video
ltxv
lightricks
Instructions to use diffusers/LTX-2.3-Diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use diffusers/LTX-2.3-Diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline from diffusers.utils import load_image, export_to_video # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("diffusers/LTX-2.3-Diffusers", dtype=torch.bfloat16, device_map="cuda") pipe.to("cuda") prompt = "A man with short gray hair plays a red electric guitar." image = load_image( "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/guitar-man.png" ) output = pipe(image=image, prompt=prompt).frames[0] export_to_video(output, "output.mp4") - Notebooks
- Google Colab
- Kaggle
LTX-2.3 (Diffusers)
Diffusers-format weights for Lightricks/LTX-2.3 — a DiT-based foundation model that jointly generates synchronized video and audio.
A distilled variant (8 steps, CFG=1) is available at diffusers/LTX-2.3-Distilled-Diffusers.
Usage
Requires a recent build of diffusers with LTX-2 support:
pip install -U git+https://github.com/huggingface/diffusers
Text-to-video + audio
import torch
from diffusers import LTX2Pipeline
from diffusers.pipelines.ltx2.export_utils import encode_video
from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
pipe = LTX2Pipeline.from_pretrained(
"diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
prompt = "A flowing river in a forest at golden hour, gentle wind in the leaves."
frame_rate = 24.0
video, audio = pipe(
prompt=prompt,
negative_prompt=DEFAULT_NEGATIVE_PROMPT,
width=768,
height=512,
num_frames=121,
frame_rate=frame_rate,
num_inference_steps=30,
guidance_scale=3.0,
output_type="np",
return_dict=False,
)
encode_video(
video[0],
fps=frame_rate,
audio=audio[0].float().cpu(),
audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
output_path="ltx2_t2v.mp4",
)
First-last-frame-to-video (FLF2V)
import torch
from diffusers import LTX2ConditionPipeline
from diffusers.pipelines.ltx2.pipeline_ltx2_condition import LTX2VideoCondition
from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
from diffusers.utils import load_image
pipe = LTX2ConditionPipeline.from_pretrained(
"diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
first_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_first_frame.png")
last_image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/flf2v_input_last_frame.png")
conditions = [
LTX2VideoCondition(frames=first_image, index=0, strength=1.0),
LTX2VideoCondition(frames=last_image, index=-1, strength=1.0),
]
prompt = "CG animation style, a small blue bird takes off from the ground, flapping its wings."
frame_rate = 24.0
video = pipe(
conditions=conditions,
prompt=prompt,
negative_prompt=DEFAULT_NEGATIVE_PROMPT,
width=768,
height=512,
num_frames=121,
frame_rate=frame_rate,
num_inference_steps=40,
guidance_scale=4.0,
output_type="np",
return_dict=False,
)
IC-LoRA (camera control)
import torch
from diffusers import LTX2InContextPipeline
from diffusers.pipelines.ltx2.export_utils import encode_video
from diffusers.pipelines.ltx2.utils import DEFAULT_NEGATIVE_PROMPT
pipe = LTX2InContextPipeline.from_pretrained(
"diffusers/LTX-2.3-Diffusers", torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
pipe.load_lora_weights(
"Lightricks/LTX-2-19b-LoRA-Camera-Control-Dolly-In",
adapter_name="ic_lora",
weight_name="ltx-2-19b-lora-camera-control-dolly-in.safetensors",
)
pipe.set_adapters("ic_lora", 1.0)
prompt = "A flowing river in a forest"
frame_rate = 24.0
video, audio = pipe(
prompt=prompt,
negative_prompt=DEFAULT_NEGATIVE_PROMPT,
width=768,
height=512,
num_frames=121,
frame_rate=frame_rate,
num_inference_steps=30,
guidance_scale=3.0,
output_type="np",
return_dict=False,
)
encode_video(
video[0],
fps=frame_rate,
audio=audio[0].float().cpu(),
audio_sample_rate=pipe.vocoder.config.output_sampling_rate,
output_path="ltx2_ic_lora.mp4",
)
Notes
widthandheightmust be divisible by 32;num_framesmust equal8k + 1.- See the Diffusers LTX-2 docs for multimodal guidance, prompt enhancement, and the upscaling/refinement pipeline.
License
These weights are released under the LTX Video 2 Open Source License.
- Downloads last month
- -
Model tree for diffusers/LTX-2.3-Diffusers
Base model
Lightricks/LTX-2.3