File size: 5,016 Bytes
a3c20e1 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 | # Quick Start Guide
Get up and running with LTX-2 training in just a few steps!
## 📋 Prerequisites
Before you begin, ensure you have:
1. **LTX-2 Model Checkpoint** - A local `.safetensors` file containing the LTX-2 model weights.
Download `ltx-2-19b-dev.safetensors` from: [HuggingFace Hub](https://huggingface.co/Lightricks/LTX-2)
2. **Gemma Text Encoder** - A local directory containing the Gemma model (required for LTX-2).
Download from: [HuggingFace Hub](https://huggingface.co/google/gemma-3-12b-it-qat-q4_0-unquantized/)
3. **Linux with CUDA** - The trainer requires `triton` which is Linux-only
4. **GPU with sufficient VRAM** - 80GB recommended for the standard config. For GPUs with 32GB VRAM (e.g., RTX 5090),
use the [low VRAM config](../configs/ltx2_av_lora_low_vram.yaml) which enables INT8 quantization and other
memory optimizations
## ⚡ Installation
First, install [uv](https://docs.astral.sh/uv/getting-started/installation/) if you haven't already.
Then clone the repository and install the dependencies:
```bash
git clone https://github.com/Lightricks/LTX-2
```
The `ltx-trainer` package is part of the `LTX-2` monorepo. Install the dependencies from the repository root,
then navigate to the trainer package:
```bash
# From the repository root
uv sync
cd packages/ltx-trainer
```
> [!NOTE]
> The trainer depends on [`ltx-core`](../../ltx-core/) and [`ltx-pipelines`](../../ltx-pipelines/)
> packages which are automatically installed from the monorepo.
## 🏋 Training Workflow
### 1. Prepare Your Dataset
Organize your videos and captions, then preprocess them:
```bash
# Split long videos into scenes (optional)
uv run python scripts/split_scenes.py input.mp4 scenes_output_dir/ --filter-shorter-than 5s
# Generate captions for videos (optional)
uv run python scripts/caption_videos.py scenes_output_dir/ --output dataset.json
# Preprocess the dataset (compute latents and embeddings)
uv run python scripts/process_dataset.py dataset.json \
--resolution-buckets "960x544x49" \
--model-path /path/to/ltx-2-model.safetensors \
--text-encoder-path /path/to/gemma-model
```
See [Dataset Preparation](dataset-preparation.md) for detailed instructions.
### 2. Configure Training
Create or modify a configuration YAML file. Start with one of the example configs:
- [`configs/ltx2_av_lora.yaml`](../configs/ltx2_av_lora.yaml) - Audio-video LoRA training
- [`configs/ltx2_av_lora_low_vram.yaml`](../configs/ltx2_av_lora_low_vram.yaml) - Audio-video LoRA training (optimized for 32GB VRAM)
- [`configs/ltx2_v2v_ic_lora.yaml`](../configs/ltx2_v2v_ic_lora.yaml) - IC-LoRA video-to-video
Key settings to update:
```yaml
model:
model_path: "/path/to/ltx-2-model.safetensors"
text_encoder_path: "/path/to/gemma-model"
data:
preprocessed_data_root: "/path/to/preprocessed/data"
output_dir: "outputs/my_training_run"
```
See [Configuration Reference](configuration-reference.md) for all available options.
### 3. Start Training
```bash
uv run python scripts/train.py configs/ltx2_av_lora.yaml
```
For multi-GPU training:
```bash
uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml
```
See [Training Guide](training-guide.md) for distributed training and advanced options.
## 🎯 Training Modes
The trainer supports several training modes:
| Mode | Description | Config Example |
|----------------------|--------------------------------|--------------------------------------------|
| **LoRA** | Efficient adapter training | `training_strategy.name: "text_to_video"` |
| **Audio-Video LoRA** | Joint audio-video training | `training_strategy.with_audio: true` |
| **IC-LoRA** | Video-to-video transformations | `training_strategy.name: "video_to_video"` |
| **Full Fine-tuning** | Full model training | `model.training_mode: "full"` |
See [Training Modes](training-modes.md) for detailed explanations,
or [Custom Training Strategies](custom-training-strategies.md) if you need to implement your own training recipe.
## Next Steps
Once you've completed your first training run, you can:
- **Use your trained LoRA for inference** - The [`ltx-pipelines`](../../ltx-pipelines/) package provides
production-ready inference
pipelines for various use cases (T2V, I2V, IC-LoRA, etc.). See the package documentation for details.
- Learn more about [Dataset Preparation](dataset-preparation.md) for advanced preprocessing
- Explore different [Training Modes](training-modes.md) (LoRA, Audio-Video, IC-LoRA)
- Dive deeper into [Training Configuration](configuration-reference.md)
- Understand the model architecture in [LTX-Core Documentation](../../ltx-core/README.md)
## Need Help?
If you run into issues at any step, see the [Troubleshooting Guide](troubleshooting.md) for solutions to common
problems.
Join our [Discord community](https://discord.gg/ltxplatform) for real-time help and discussion!
|