File size: 5,016 Bytes
a3c20e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# Quick Start Guide

Get up and running with LTX-2 training in just a few steps!

## 📋 Prerequisites

Before you begin, ensure you have:

1. **LTX-2 Model Checkpoint** - A local `.safetensors` file containing the LTX-2 model weights.
   Download `ltx-2-19b-dev.safetensors` from: [HuggingFace Hub](https://huggingface.co/Lightricks/LTX-2)
2. **Gemma Text Encoder** - A local directory containing the Gemma model (required for LTX-2).
   Download from: [HuggingFace Hub](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized/)
3. **Linux with CUDA** - The trainer requires `triton` which is Linux-only
4. **GPU with sufficient VRAM** - 80GB recommended for the standard config. For GPUs with 32GB VRAM (e.g., RTX 5090),
   use the [low VRAM config](../configs/ltx2_av_lora_low_vram.yaml) which enables INT8 quantization and other
   memory optimizations

## ⚡ Installation

First, install [uv](https://docs.astral.sh/uv/getting-started/installation/) if you haven't already.
Then clone the repository and install the dependencies:

```bash
git clone https://github.com/Lightricks/LTX-2
```

The `ltx-trainer` package is part of the `LTX-2` monorepo. Install the dependencies from the repository root,
then navigate to the trainer package:

```bash
# From the repository root
uv sync
cd packages/ltx-trainer
```

> [!NOTE]
> The trainer depends on [`ltx-core`](../../ltx-core/) and [`ltx-pipelines`](../../ltx-pipelines/)
> packages which are automatically installed from the monorepo.

## 🏋 Training Workflow

### 1. Prepare Your Dataset

Organize your videos and captions, then preprocess them:

```bash
# Split long videos into scenes (optional)
uv run python scripts/split_scenes.py input.mp4 scenes_output_dir/ --filter-shorter-than 5s

# Generate captions for videos (optional)
uv run python scripts/caption_videos.py scenes_output_dir/ --output dataset.json

# Preprocess the dataset (compute latents and embeddings)
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "960x544x49" \
    --model-path /path/to/ltx-2-model.safetensors \
    --text-encoder-path /path/to/gemma-model
```

See [Dataset Preparation](dataset-preparation.md) for detailed instructions.

### 2. Configure Training

Create or modify a configuration YAML file. Start with one of the example configs:

- [`configs/ltx2_av_lora.yaml`](../configs/ltx2_av_lora.yaml) - Audio-video LoRA training
- [`configs/ltx2_av_lora_low_vram.yaml`](../configs/ltx2_av_lora_low_vram.yaml) - Audio-video LoRA training (optimized for 32GB VRAM)
- [`configs/ltx2_v2v_ic_lora.yaml`](../configs/ltx2_v2v_ic_lora.yaml) - IC-LoRA video-to-video

Key settings to update:

```yaml
model:
  model_path: "/path/to/ltx-2-model.safetensors"
  text_encoder_path: "/path/to/gemma-model"

data:
  preprocessed_data_root: "/path/to/preprocessed/data"

output_dir: "outputs/my_training_run"
```

See [Configuration Reference](configuration-reference.md) for all available options.

### 3. Start Training

```bash
uv run python scripts/train.py configs/ltx2_av_lora.yaml
```

For multi-GPU training:

```bash
uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml
```

See [Training Guide](training-guide.md) for distributed training and advanced options.

## 🎯 Training Modes

The trainer supports several training modes:

| Mode                 | Description                    | Config Example                             |
|----------------------|--------------------------------|--------------------------------------------|
| **LoRA**             | Efficient adapter training     | `training_strategy.name: "text_to_video"`  |
| **Audio-Video LoRA** | Joint audio-video training     | `training_strategy.with_audio: true`       |
| **IC-LoRA**          | Video-to-video transformations | `training_strategy.name: "video_to_video"` |
| **Full Fine-tuning** | Full model training            | `model.training_mode: "full"`              |

See [Training Modes](training-modes.md) for detailed explanations,
or [Custom Training Strategies](custom-training-strategies.md) if you need to implement your own training recipe.

## Next Steps

Once you've completed your first training run, you can:

- **Use your trained LoRA for inference** - The [`ltx-pipelines`](../../ltx-pipelines/) package provides
  production-ready inference
  pipelines for various use cases (T2V, I2V, IC-LoRA, etc.). See the package documentation for details.
- Learn more about [Dataset Preparation](dataset-preparation.md) for advanced preprocessing
- Explore different [Training Modes](training-modes.md) (LoRA, Audio-Video, IC-LoRA)
- Dive deeper into [Training Configuration](configuration-reference.md)
- Understand the model architecture in [LTX-Core Documentation](../../ltx-core/README.md)

## Need Help?

If you run into issues at any step, see the [Troubleshooting Guide](troubleshooting.md) for solutions to common
problems.

Join our [Discord community](https://discord.gg/ltxplatform) for real-time help and discussion!