ltx-2 / packages /ltx-trainer /docs /quick-start.md
linoy
inital commit
ebfc6b3

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

Quick Start Guide

Get up and running with LTX-2 training in just a few steps!

πŸ“‹ Prerequisites

Before you begin, ensure you have:

  1. LTX-2 Model Checkpoint - A local .safetensors file containing the LTX-2 model weights
  2. Gemma Text Encoder - A local directory containing the Gemma model (required for LTX-2). Download from: HuggingFace Hub
  3. Linux with CUDA - The trainer requires triton which is Linux-only
  4. GPU with sufficient VRAM - 80GB recommended. Lower VRAM may work with gradient checkpointing and lower resolutions

⚑ Installation

First, install uv if you haven't already. Then clone the repository and install the dependencies:

git clone https://github.com/Lightricks/LTX-Video

The ltx-trainer package is part of the LTX-2 monorepo. Install the dependencies from the repository root, then navigate to the trainer package:

# From the repository root
uv sync
cd packages/ltx-trainer

The trainer depends on ltx-core and ltx-pipelines packages which are automatically installed from the monorepo.

πŸ‹ Training Workflow

1. Prepare Your Dataset

Organize your videos and captions, then preprocess them:

# Split long videos into scenes (optional)
uv run python scripts/split_scenes.py input.mp4 scenes_output_dir/ --filter-shorter-than 5s

# Generate captions for videos (optional)
uv run python scripts/caption_videos.py scenes_output_dir/ --output dataset.json

# Preprocess the dataset (compute latents and embeddings)
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "960x544x49" \
    --model-path /path/to/ltx-2-model.safetensors \
    --text-encoder-path /path/to/gemma-model

See Dataset Preparation for detailed instructions.

2. Configure Training

Create or modify a configuration YAML file. Start with one of the example configs:

Key settings to update:

model:
  model_path: "/path/to/ltx-2-model.safetensors"
  text_encoder_path: "/path/to/gemma-model"

data:
  preprocessed_data_root: "/path/to/preprocessed/data"

output_dir: "outputs/my_training_run"

See Configuration Reference for all available options.

3. Start Training

uv run python scripts/train.py configs/ltx2_av_lora.yaml

For multi-GPU training:

uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml

See Training Guide for distributed training and advanced options.

🎯 Training Modes

The trainer supports several training modes:

Mode Description Config Example
LoRA Efficient adapter training training_strategy.name: "text_to_video"
Audio-Video LoRA Joint audio-video training training_strategy.with_audio: true
IC-LoRA Video-to-video transformations training_strategy.name: "video_to_video"
Full Fine-tuning Full model training model.training_mode: "full"

See Training Modes for detailed explanations.

Next Steps

Once you've completed your first training run, you can:

Need Help?

If you run into issues at any step, see the Troubleshooting Guide for solutions to common problems.

Join our Discord community for real-time help and discussion!