Quick Start Guide

Get up and running with LTX-2 training in just a few steps!

📋 Prerequisites

Before you begin, ensure you have:

LTX-2 Model Checkpoint - A local .safetensors file containing the LTX-2 model weights. Download ltx-2-19b-dev.safetensors from: HuggingFace Hub
Gemma Text Encoder - A local directory containing the Gemma model (required for LTX-2). Download from: HuggingFace Hub
Linux with CUDA - The trainer requires triton which is Linux-only
GPU with sufficient VRAM - 80GB recommended for the standard config. For GPUs with 32GB VRAM (e.g., RTX 5090), use the low VRAM config which enables INT8 quantization and other memory optimizations

⚡ Installation

First, install uv if you haven't already. Then clone the repository and install the dependencies:

git clone https://github.com/Lightricks/LTX-2

The ltx-trainer package is part of the LTX-2 monorepo. Install the dependencies from the repository root, then navigate to the trainer package:

# From the repository root
uv sync
cd packages/ltx-trainer

The trainer depends on ltx-core and ltx-pipelines packages which are automatically installed from the monorepo.

🏋 Training Workflow

1. Prepare Your Dataset

Organize your videos and captions, then preprocess them:

# Split long videos into scenes (optional)
uv run python scripts/split_scenes.py input.mp4 scenes_output_dir/ --filter-shorter-than 5s

# Generate captions for videos (optional)
uv run python scripts/caption_videos.py scenes_output_dir/ --output dataset.json

# Preprocess the dataset (compute latents and embeddings)
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "960x544x49" \
    --model-path /path/to/ltx-2-model.safetensors \
    --text-encoder-path /path/to/gemma-model

See Dataset Preparation for detailed instructions.

2. Configure Training

Create or modify a configuration YAML file. Start with one of the example configs:

configs/ltx2_av_lora.yaml - Audio-video LoRA training
configs/ltx2_av_lora_low_vram.yaml - Audio-video LoRA training (optimized for 32GB VRAM)
configs/ltx2_v2v_ic_lora.yaml - IC-LoRA video-to-video

Key settings to update:

model:
  model_path: "/path/to/ltx-2-model.safetensors"
  text_encoder_path: "/path/to/gemma-model"

data:
  preprocessed_data_root: "/path/to/preprocessed/data"

output_dir: "outputs/my_training_run"

See Configuration Reference for all available options.

3. Start Training

uv run python scripts/train.py configs/ltx2_av_lora.yaml

For multi-GPU training:

uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml

See Training Guide for distributed training and advanced options.

🎯 Training Modes

The trainer supports several training modes:

Mode	Description	Config Example
LoRA	Efficient adapter training	`training_strategy.name: "text_to_video"`
Audio-Video LoRA	Joint audio-video training	`training_strategy.with_audio: true`
IC-LoRA	Video-to-video transformations	`training_strategy.name: "video_to_video"`
Full Fine-tuning	Full model training	`model.training_mode: "full"`

See Training Modes for detailed explanations, or Custom Training Strategies if you need to implement your own training recipe.

Next Steps

Once you've completed your first training run, you can:

Use your trained LoRA for inference - The ltx-pipelines package provides production-ready inference pipelines for various use cases (T2V, I2V, IC-LoRA, etc.). See the package documentation for details.
Learn more about Dataset Preparation for advanced preprocessing
Explore different Training Modes (LoRA, Audio-Video, IC-LoRA)
Dive deeper into Training Configuration
Understand the model architecture in LTX-Core Documentation

Need Help?

If you run into issues at any step, see the Troubleshooting Guide for solutions to common problems.

Join our Discord community for real-time help and discussion!