File size: 4,525 Bytes
ebfc6b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
# Quick Start Guide

Get up and running with LTX-2 training in just a few steps!

## 📋 Prerequisites

Before you begin, ensure you have:

1. **LTX-2 Model Checkpoint** - A local `.safetensors` file containing the LTX-2 model weights
2. **Gemma Text Encoder** - A local directory containing the Gemma model (required for LTX-2).
   Download from: [HuggingFace Hub](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized/)
3. **Linux with CUDA** - The trainer requires `triton` which is Linux-only
4. **GPU with sufficient VRAM** - 80GB recommended. Lower VRAM may work with gradient checkpointing and lower
   resolutions

## ⚡ Installation

First, install [uv](https://docs.astral.sh/uv/getting-started/installation/) if you haven't already.
Then clone the repository and install the dependencies:

```bash
git clone https://github.com/Lightricks/LTX-Video
```

The `ltx-trainer` package is part of the `LTX-2` monorepo. Install the dependencies from the repository root,
then navigate to the trainer package:

```bash
# From the repository root
uv sync
cd packages/ltx-trainer
```

> [!NOTE]
> The trainer depends on [`ltx-core`](../../ltx-core/) and [`ltx-pipelines`](../../ltx-pipelines/) packages which are automatically installed from the monorepo.

## 🏋 Training Workflow

### 1. Prepare Your Dataset

Organize your videos and captions, then preprocess them:

```bash
# Split long videos into scenes (optional)
uv run python scripts/split_scenes.py input.mp4 scenes_output_dir/ --filter-shorter-than 5s

# Generate captions for videos (optional)
uv run python scripts/caption_videos.py scenes_output_dir/ --output dataset.json

# Preprocess the dataset (compute latents and embeddings)
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "960x544x49" \
    --model-path /path/to/ltx-2-model.safetensors \
    --text-encoder-path /path/to/gemma-model
```

See [Dataset Preparation](dataset-preparation.md) for detailed instructions.

### 2. Configure Training

Create or modify a configuration YAML file. Start with one of the example configs:

- [`configs/ltx2_av_lora.yaml`](../configs/ltx2_av_lora.yaml) - Audio-video LoRA training
- [`configs/ltx2_v2v_ic_lora.yaml`](../configs/ltx2_v2v_ic_lora.yaml) - IC-LoRA video-to-video

Key settings to update:

```yaml
model:
  model_path: "/path/to/ltx-2-model.safetensors"
  text_encoder_path: "/path/to/gemma-model"

data:
  preprocessed_data_root: "/path/to/preprocessed/data"

output_dir: "outputs/my_training_run"
```

See [Configuration Reference](configuration-reference.md) for all available options.

### 3. Start Training

```bash
uv run python scripts/train.py configs/ltx2_av_lora.yaml
```

For multi-GPU training:

```bash
uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml
```

See [Training Guide](training-guide.md) for distributed training and advanced options.

## 🎯 Training Modes

The trainer supports several training modes:

| Mode                 | Description                    | Config Example                             |
|----------------------|--------------------------------|--------------------------------------------|
| **LoRA**             | Efficient adapter training     | `training_strategy.name: "text_to_video"`  |
| **Audio-Video LoRA** | Joint audio-video training     | `training_strategy.with_audio: true`       |
| **IC-LoRA**          | Video-to-video transformations | `training_strategy.name: "video_to_video"` |
| **Full Fine-tuning** | Full model training            | `model.training_mode: "full"`              |

See [Training Modes](training-modes.md) for detailed explanations.

## Next Steps

Once you've completed your first training run, you can:

- **Use your trained LoRA for inference** - The [`ltx-pipelines`](../../ltx-pipelines/) package provides production-ready inference
  pipelines for various use cases (T2V, I2V, IC-LoRA, etc.). See the package documentation for details.
- Learn more about [Dataset Preparation](dataset-preparation.md) for advanced preprocessing
- Explore different [Training Modes](training-modes.md) (LoRA, Audio-Video, IC-LoRA)
- Dive deeper into [Training Configuration](configuration-reference.md)
- Understand the model architecture in [LTX-Core API Guide](ltx-core-api-guide.md)

## Need Help?

If you run into issues at any step, see the [Troubleshooting Guide](troubleshooting.md) for solutions to common
problems.

Join our [Discord community](https://discord.gg/2mafsHjJ) for real-time help and discussion!