Quick Start Guide
Get up and running with LTX-2 training in just a few steps!
π Prerequisites
Before you begin, ensure you have:
- LTX-2 Model Checkpoint - A local
.safetensorsfile containing the LTX-2 model weights. Downloadltx-2-19b-dev.safetensorsfrom: HuggingFace Hub - Gemma Text Encoder - A local directory containing the Gemma model (required for LTX-2). Download from: HuggingFace Hub
- Linux with CUDA - The trainer requires
tritonwhich is Linux-only - GPU with sufficient VRAM - 80GB recommended for the standard config. For GPUs with 32GB VRAM (e.g., RTX 5090), use the low VRAM config which enables INT8 quantization and other memory optimizations
β‘ Installation
First, install uv if you haven't already. Then clone the repository and install the dependencies:
git clone https://github.com/Lightricks/LTX-2
The ltx-trainer package is part of the LTX-2 monorepo. Install the dependencies from the repository root,
then navigate to the trainer package:
# From the repository root
uv sync
cd packages/ltx-trainer
The trainer depends on
ltx-coreandltx-pipelinespackages which are automatically installed from the monorepo.
π Training Workflow
1. Prepare Your Dataset
Organize your videos and captions, then preprocess them:
# Split long videos into scenes (optional)
uv run python scripts/split_scenes.py input.mp4 scenes_output_dir/ --filter-shorter-than 5s
# Generate captions for videos (optional)
uv run python scripts/caption_videos.py scenes_output_dir/ --output dataset.json
# Preprocess the dataset (compute latents and embeddings)
uv run python scripts/process_dataset.py dataset.json \
--resolution-buckets "960x544x49" \
--model-path /path/to/ltx-2-model.safetensors \
--text-encoder-path /path/to/gemma-model
See Dataset Preparation for detailed instructions.
2. Configure Training
Create or modify a configuration YAML file. Start with one of the example configs:
configs/ltx2_av_lora.yaml- Audio-video LoRA trainingconfigs/ltx2_av_lora_low_vram.yaml- Audio-video LoRA training (optimized for 32GB VRAM)configs/ltx2_v2v_ic_lora.yaml- IC-LoRA video-to-video
Key settings to update:
model:
model_path: "/path/to/ltx-2-model.safetensors"
text_encoder_path: "/path/to/gemma-model"
data:
preprocessed_data_root: "/path/to/preprocessed/data"
output_dir: "outputs/my_training_run"
See Configuration Reference for all available options.
3. Start Training
uv run python scripts/train.py configs/ltx2_av_lora.yaml
For multi-GPU training:
uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml
See Training Guide for distributed training and advanced options.
π― Training Modes
The trainer supports several training modes:
| Mode | Description | Config Example |
|---|---|---|
| LoRA | Efficient adapter training | training_strategy.name: "text_to_video" |
| Audio-Video LoRA | Joint audio-video training | training_strategy.with_audio: true |
| IC-LoRA | Video-to-video transformations | training_strategy.name: "video_to_video" |
| Full Fine-tuning | Full model training | model.training_mode: "full" |
See Training Modes for detailed explanations, or Custom Training Strategies if you need to implement your own training recipe.
Next Steps
Once you've completed your first training run, you can:
- Use your trained LoRA for inference - The
ltx-pipelinespackage provides production-ready inference pipelines for various use cases (T2V, I2V, IC-LoRA, etc.). See the package documentation for details. - Learn more about Dataset Preparation for advanced preprocessing
- Explore different Training Modes (LoRA, Audio-Video, IC-LoRA)
- Dive deeper into Training Configuration
- Understand the model architecture in LTX-Core Documentation
Need Help?
If you run into issues at any step, see the Troubleshooting Guide for solutions to common problems.
Join our Discord community for real-time help and discussion!