A newer version of the Gradio SDK is available:
6.3.0
Quick Start Guide
Get up and running with LTX-2 training in just a few steps!
π Prerequisites
Before you begin, ensure you have:
- LTX-2 Model Checkpoint - A local
.safetensorsfile containing the LTX-2 model weights - Gemma Text Encoder - A local directory containing the Gemma model (required for LTX-2). Download from: HuggingFace Hub
- Linux with CUDA - The trainer requires
tritonwhich is Linux-only - GPU with sufficient VRAM - 80GB recommended. Lower VRAM may work with gradient checkpointing and lower resolutions
β‘ Installation
First, install uv if you haven't already. Then clone the repository and install the dependencies:
git clone https://github.com/Lightricks/LTX-Video
The ltx-trainer package is part of the LTX-2 monorepo. Install the dependencies from the repository root,
then navigate to the trainer package:
# From the repository root
uv sync
cd packages/ltx-trainer
The trainer depends on
ltx-coreandltx-pipelinespackages which are automatically installed from the monorepo.
π Training Workflow
1. Prepare Your Dataset
Organize your videos and captions, then preprocess them:
# Split long videos into scenes (optional)
uv run python scripts/split_scenes.py input.mp4 scenes_output_dir/ --filter-shorter-than 5s
# Generate captions for videos (optional)
uv run python scripts/caption_videos.py scenes_output_dir/ --output dataset.json
# Preprocess the dataset (compute latents and embeddings)
uv run python scripts/process_dataset.py dataset.json \
--resolution-buckets "960x544x49" \
--model-path /path/to/ltx-2-model.safetensors \
--text-encoder-path /path/to/gemma-model
See Dataset Preparation for detailed instructions.
2. Configure Training
Create or modify a configuration YAML file. Start with one of the example configs:
configs/ltx2_av_lora.yaml- Audio-video LoRA trainingconfigs/ltx2_v2v_ic_lora.yaml- IC-LoRA video-to-video
Key settings to update:
model:
model_path: "/path/to/ltx-2-model.safetensors"
text_encoder_path: "/path/to/gemma-model"
data:
preprocessed_data_root: "/path/to/preprocessed/data"
output_dir: "outputs/my_training_run"
See Configuration Reference for all available options.
3. Start Training
uv run python scripts/train.py configs/ltx2_av_lora.yaml
For multi-GPU training:
uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml
See Training Guide for distributed training and advanced options.
π― Training Modes
The trainer supports several training modes:
| Mode | Description | Config Example |
|---|---|---|
| LoRA | Efficient adapter training | training_strategy.name: "text_to_video" |
| Audio-Video LoRA | Joint audio-video training | training_strategy.with_audio: true |
| IC-LoRA | Video-to-video transformations | training_strategy.name: "video_to_video" |
| Full Fine-tuning | Full model training | model.training_mode: "full" |
See Training Modes for detailed explanations.
Next Steps
Once you've completed your first training run, you can:
- Use your trained LoRA for inference - The
ltx-pipelinespackage provides production-ready inference pipelines for various use cases (T2V, I2V, IC-LoRA, etc.). See the package documentation for details. - Learn more about Dataset Preparation for advanced preprocessing
- Explore different Training Modes (LoRA, Audio-Video, IC-LoRA)
- Dive deeper into Training Configuration
- Understand the model architecture in LTX-Core API Guide
Need Help?
If you run into issues at any step, see the Troubleshooting Guide for solutions to common problems.
Join our Discord community for real-time help and discussion!