Spaces:
Paused
A newer version of the Gradio SDK is available: 6.14.0
LTX-2 Pipelines
High-level pipeline implementations for generating audio-video content with Lightricks' LTX-2 model. This package provides ready-to-use pipelines for text-to-video, image-to-video, video-to-video, and keyframe interpolation tasks.
Pipelines are built using building blocks from ltx-core (schedulers, guiders, noisers, patchifiers) and handle the complete inference flow including model loading, encoding, decoding, and file I/O.
π Overview
LTX-2 Pipelines provides production-ready implementations that abstract away the complexity of the diffusion process, model loading, and memory management. Each pipeline is optimized for specific use cases and offers different trade-offs between speed, quality, and memory usage.
Key Features:
- π¬ Multiple Pipeline Types: Text-to-video, image-to-video, video-to-video, and keyframe interpolation
- β‘ Optimized Performance: Support for FP8 transformers, gradient estimation, and memory optimization
- π― Production Ready: Two-stage pipelines for best quality output
- π§ LoRA Support: Easy integration with trained LoRA adapters
- π¦ Self-Contained: Handles model loading, encoding, decoding, and file I/O
- π CLI Support: All pipelines can be run as command-line scripts
π Quick Start
ltx-pipelines provides ready-made inference pipelines for text-to-video, image-to-video, video-to-video, and keyframe interpolation. Built using building blocks from ltx-core, these pipelines handle the complete inference flow including model loading, encoding, decoding, and file I/O.
π§ Installation
# From the repository root
uv sync --frozen
# Or install as a package
pip install -e packages/ltx-pipelines
Running Pipelines
All pipelines can be run directly from the command line. Each pipeline module is executable:
# Run a pipeline (example: two-stage text-to-video)
python -m ltx_pipelines.ti2vid_two_stages \
--checkpoint-path path/to/checkpoint.safetensors \
--distilled-lora-path path/to/distilled_lora.safetensors \
--spatial-upsampler-path path/to/upsampler.safetensors \
--gemma-root path/to/gemma \
--prompt "A beautiful sunset over the ocean" \
--output-path output.mp4
# View all available options for any pipeline
python -m ltx_pipelines.ti2vid_two_stages --help
Available pipeline modules:
ltx_pipelines.ti2vid_two_stages- Two-stage text-to-video (recommended)ltx_pipelines.ti2vid_one_stage- Single-stage text-to-videoltx_pipelines.distilled- Fast distilled pipelineltx_pipelines.ic_lora- Video-to-video with IC-LoRAltx_pipelines.keyframe_interpolation- Keyframe interpolation
Use --help with any pipeline module to see all available options and parameters.
π― Pipeline Selection Guide
Quick Decision Tree
Do you need to condition on existing images/videos?
ββ YES β Do you have reference videos for video-to-video?
β ββ YES β Use ICLoraPipeline
β ββ NO β Do you have keyframe images to interpolate?
β ββ YES β Use KeyframeInterpolationPipeline
β ββ NO β Use ICLoraPipeline (image conditioning only)
β
ββ NO β Text-to-video only
ββ Do you need best quality?
β ββ YES β Use TI2VidTwoStagesPipeline (recommended for production)
β
ββ Do you need fastest inference?
ββ YES β Use DistilledPipeline (with 8 predefined sigmas)
Note:
TI2VidOneStagePipelineis primarily for educational purposes. For best quality, use two-stage pipelines (TI2VidTwoStagesPipeline,ICLoraPipeline,KeyframeInterpolationPipeline, orDistilledPipeline).
Features Comparison
| Pipeline | Stages | CFG | Upsampling | Conditioning | Best For |
|---|---|---|---|---|---|
| TI2VidTwoStagesPipeline | 2 | β | β | Image | Production quality (recommended) |
| TI2VidOneStagePipeline | 1 | β | β | Image | Educational, prototyping |
| DistilledPipeline | 2 | β | β | Image | Fastest inference (8 sigmas) |
| ICLoraPipeline | 2 | β | β | Image + Video | Video-to-video transformations |
| KeyframeInterpolationPipeline | 2 | β | β | Keyframes | Animation, interpolation |
π¦ Available Pipelines
1. TI2VidTwoStagesPipeline
Best for: High-quality text-to-video generation with upsampling. Recommended for production use.
Source: src/ltx_pipelines/ti2vid_two_stages.py
Two-stage generation: Stage 1 generates low-resolution video with CFG guidance, Stage 2 upsamples to 2x resolution with distilled LoRA refinement. Supports image conditioning. Highest quality output, slower than one-stage but significantly better quality.
Use when: Production-quality video generation, higher resolution needed, quality over speed, text-to-video with image conditioning.
2. TI2VidOneStagePipeline
Best for: Educational purposes and quick prototyping.
Source: src/ltx_pipelines/ti2vid_one_stage.py
β οΈ Important: This pipeline is primarily for educational purposes. For production-quality results, use
TI2VidTwoStagesPipelineor other two-stage pipelines.
Single-stage generation (no upsampling) with CFG guidance and image conditioning support. Faster inference but lower resolution output (typically 512x768).
Use when: Learning how the pipeline works, quick prototyping, testing, or when high resolution is not needed.
3. DistilledPipeline
Best for: Fastest inference with good quality using a distilled model with predefined sigma schedule.
Source: src/ltx_pipelines/distilled.py
Two-stage generation with 8 predefined sigmas (8 steps in stage 1, 4 steps in stage 2). No CFG guidance required. Fastest inference among all pipelines. Supports image conditioning. Requires spatial upsampler.
Use when: Fastest inference is critical, batch processing many videos, or when you have a distilled model checkpoint.
4. ICLoraPipeline
Best for: Video-to-video and image-to-video transformations using IC-LoRA.
Source: src/ltx_pipelines/ic_lora.py
Two-stage generation with IC-LoRA support. Can condition on reference videos (video-to-video) or images at specific frames. CFG guidance in stage 1, upsampling in stage 2. Requires IC-LoRA trained model.
Use when: Video-to-video transformations, image-to-video with strong control, or when you have reference videos to guide generation.
5. KeyframeInterpolationPipeline
Best for: Generating videos by interpolating between keyframe images.
Source: src/ltx_pipelines/keyframe_interpolation.py
Two-stage generation with keyframe interpolation. Uses guiding latents (additive conditioning) instead of replacing latents for smoother transitions. CFG guidance in stage 1, upsampling in stage 2.
Use when: You have keyframe images and want to interpolate between them, creating smooth transitions, or animation/motion interpolation tasks.
π¨ Conditioning Types
Pipelines use different conditioning methods from ltx-core for controlling generation. See the ltx-core conditioning documentation for details.
Image Conditioning
All pipelines support image conditioning, but with different methods:
Replacing Latents (
image_conditionings_by_replacing_latent):- Used by:
TI2VidOneStagePipeline,TI2VidTwoStagesPipeline,DistilledPipeline,ICLoraPipeline - Replaces the latent at a specific frame with the encoded image
- Strong control over specific frames
- Used by:
Guiding Latents (
image_conditionings_by_adding_guiding_latent):- Used by:
KeyframeInterpolationPipeline - Adds the image as a guiding signal rather than replacing
- Better for smooth interpolation between keyframes
- Used by:
Video Conditioning
- Video Conditioning (ICLoraPipeline only):
- Conditions on entire reference videos
- Useful for video-to-video transformations
- Uses
VideoConditionByKeyframeIndexfromltx-core
β‘ Optimization Tips
Memory Optimization
FP8 Transformer (Lower Memory Footprint):
For smaller GPU memory footprint, use the enable-fp8 flag and use the PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True environment variable.
CLI:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python -m ltx_pipelines.ti2vid_one_stage --enable-fp8 --checkpoint-path=...
Programmatically:
When authoring custom scripts, pass the fp8transformer flag to pipeline classes or construct your own by analogy:
pipeline = TI2VidTwoStagesPipeline(
checkpoint_path=ltx_model_path,
distilled_lora_path=distilled_lora_path,
distilled_lora_strength=0.6,
spatial_upsampler_path=upsampler_path,
gemma_root=gemma_root_path,
loras=[],
fp8transformer=True,
)
pipeline(...)
You still need to use PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True when launching:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True python my_denoising_pipeline.py
Memory Cleanup Between Stages:
By default, pipelines clean GPU memory (especially transformer weights) between stages. If you have enough memory, you can skip this cleanup to reduce running time:
# In pipeline implementations, memory cleanup happens automatically
# between stages. For custom pipelines, you can skip:
# utils.cleanup_memory() # Comment out if you have enough VRAM
Denoising Loop Optimization
Gradient Estimation Denoising Loop:
Instead of the standard Euler denoising loop, you can use gradient estimation for fewer steps (~20-30 instead of 40):
from ltx_pipelines.utils.helpers import gradient_estimating_euler_denoising_loop
# Use gradient estimation denoising loop
def denoising_loop(sigmas, video_state, audio_state, stepper):
return gradient_estimating_euler_denoising_loop(
sigmas=sigmas,
video_state=video_state,
audio_state=audio_state,
stepper=stepper,
denoise_fn=your_denoise_function,
ge_gamma=2.0, # Gradient estimation coefficient
)
This allows you to use 20-30 steps instead of 40 while maintaining quality. The gradient estimation function is available in pipeline_utils.py.
π§ Requirements
- LTX-2 Model Checkpoint - Local
.safetensorsfile - Gemma Text Encoder - Local Gemma model directory
- Spatial Upscaler - Required for two-stage pipelines (except one-stage)
- Distilled LoRA - Required for two-stage pipelines (except one-stage and distilled)
π Example: Image-to-Video
from ltx_pipelines.ti2vid_two_stages import TI2VidTwoStagesPipeline
pipeline = TI2VidTwoStagesPipeline(
checkpoint_path="/path/to/checkpoint.safetensors",
distilled_lora_path="/path/to/distilled_lora.safetensors",
spatial_upsampler_path="/path/to/upsampler.safetensors",
gemma_root="/path/to/gemma",
loras=[],
)
# Generate video from image
pipeline(
prompt="A serene landscape with mountains in the background",
output_path="output.mp4",
seed=42,
height=512,
width=768,
num_frames=121,
frame_rate=25.0,
num_inference_steps=40,
cfg_guidance_scale=3.0,
images=[("input_image.jpg", 0, 1.0)], # Image at frame 0, strength 1.0
)
π Related Projects
- LTX-Core - Core model implementation and inference components (schedulers, guiders, noisers, patchifiers)
- LTX-Trainer - Training and fine-tuning tools