File size: 7,201 Bytes
ebfc6b3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
# Troubleshooting Guide

This guide covers common issues and solutions when training with the LTX-2 trainer.

## 🔧 VRAM and Memory Issues

Memory management is crucial for successful training with LTX-2.

### Memory Optimization Techniques

#### 1. Enable Gradient Checkpointing

Gradient checkpointing trades training speed for memory savings. **Highly recommended** for most training runs:

```yaml
optimization:
  enable_gradient_checkpointing: true
```

#### 2. Enable 8-bit Text Encoder

Load the Gemma text encoder in 8-bit precision to save GPU memory:

```yaml
acceleration:
  load_text_encoder_in_8bit: true
```

#### 3. Reduce Batch Size

Lower the batch size if you encounter out-of-memory errors:

```yaml
optimization:
  batch_size: 1  # Start with 1 and increase gradually
```

Use gradient accumulation to maintain a larger effective batch size:

```yaml
optimization:
  batch_size: 1
  gradient_accumulation_steps: 4  # Effective batch size = 4
```

#### 4. Use Lower Resolution

Reduce spatial or temporal dimensions to save memory:

```bash
# Smaller spatial resolution
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "512x512x49" \
    --model-path /path/to/model.safetensors \
    --text-encoder-path /path/to/gemma

# Fewer frames
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "960x544x25" \
    --model-path /path/to/model.safetensors \
    --text-encoder-path /path/to/gemma
```

#### 5. Enable Model Quantization

Use quantization to reduce memory usage:

```yaml
acceleration:
  quantization: "int8-quanto"  # Options: int8-quanto, int4-quanto, fp8-quanto
```

#### 6. Use 8-bit Optimizer

The 8-bit AdamW optimizer uses less memory:

```yaml
optimization:
  optimizer_type: "adamw8bit"
```

---

## ⚠️ Common Usage Issues

### Issue: "No module named 'ltx_trainer'" Error

**Solution:**
Ensure you've installed the dependencies and are using `uv run` to execute scripts:

```bash
# From the repository root
uv sync
cd packages/ltx-trainer
uv run python scripts/train.py configs/ltx2_av_lora.yaml
```

> [!TIP]
> Always use `uv run` to execute Python scripts. This automatically uses the correct virtual environment
> without requiring manual activation.

### Issue: "Gemma model path is not a directory" Error

**Solution:**
The `text_encoder_path` must point to a directory containing the Gemma model, not a file:

```yaml
model:
  model_path: "/path/to/ltx-2-model.safetensors"  # File path
  text_encoder_path: "/path/to/gemma-model/"      # Directory path
```

### Issue: "Model path does not exist" Error

**Solution:**
LTX-2 requires local model paths. URLs are not supported:

```yaml
# ✅ Correct - local path
model:
  model_path: "/path/to/ltx-2-model.safetensors"

# ❌ Wrong - URL not supported
model:
  model_path: "https://huggingface.co/..."
```

### Issue: "Frames must satisfy frames % 8 == 1" Error

**Solution:**
LTX-2 requires the number of frames to satisfy `frames % 8 == 1`:

- ✅ Valid: 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 121
- ❌ Invalid: 24, 32, 48, 64, 100

### Issue: Slow Training Speed

**Optimizations:**

1. **Disable gradient checkpointing** (if you have enough VRAM):

   ```yaml
   optimization:
     enable_gradient_checkpointing: false
   ```


2. **Use torch.compile** via Accelerate:

   ```bash
   uv run accelerate launch --config_file configs/accelerate/ddp_compile.yaml \
     scripts/train.py configs/ltx2_av_lora.yaml
   ```

### Issue: Poor Quality Validation Outputs

**Solutions:**

1. **Use Image-to-Video Validation:**
   For more reliable validation, use image-to-video (first-frame conditioning) rather than pure text-to-video:

   ```yaml
   validation:
     prompts:
       - "a professional portrait video of a person"
     images:
       - "/path/to/first_frame.png"  # One image per prompt
   ```

2. **Increase inference steps:**

   ```yaml
   validation:
     inference_steps: 50  # Default is 30
   ```

3. **Adjust guidance settings:**

   ```yaml
   validation:
     guidance_scale: 3.0  # CFG scale (recommended: 3.0)
     stg_scale: 1.0       # STG scale for temporal coherence (recommended: 1.0)
     stg_blocks: [29]     # Transformer block to perturb
   ```

4. **Check caption quality:**
   Review and manually edit captions for accuracy if using auto-generated captions.
   LTX-2 prefers long, detailed captions that describe both visual content and audio (e.g., ambient sounds, speech, music).

5. **Check target modules:**
   Ensure your `target_modules` configuration matches your training goals. For audio-video training,
   use patterns that match both branches (e.g., `"to_k"` instead of `"attn1.to_k"`).
   See [Understanding Target Modules](configuration-reference.md#understanding-target-modules) for details.

6. **Adjust LoRA rank:**
   Try higher values for more capacity:

   ```yaml
   lora:
     rank: 64  # Or 128 for more capacity
   ```

7. **Increase training steps:**

   ```yaml
   optimization:
     steps: 3000
   ```

---

## 🔍 Debugging Tools

### Monitor GPU Memory Usage

Track memory usage during training:

```bash
# Watch GPU memory in real-time
watch -n 1 nvidia-smi

# Log memory usage to file
nvidia-smi --query-gpu=memory.used,memory.total --format=csv --loop=5 > memory_log.csv
```

### Verify Preprocessed Data

Decode latents to visualize the preprocessed videos:

```bash
uv run python scripts/decode_latents.py dataset/.precomputed/latents debug_output \
    --model-path /path/to/model.safetensors
```

To also decode audio latents, add the `--with-audio` flag:

```bash
uv run python scripts/decode_latents.py dataset/.precomputed/latents debug_output \
    --model-path /path/to/model.safetensors \
    --with-audio
```

Compare decoded videos and audio with originals to ensure quality.

---

## 💡 Best Practices

### Before Training

- [ ] Test preprocessing with a small subset first
- [ ] Verify all video files are accessible
- [ ] Check available GPU memory
- [ ] Review configuration against hardware capabilities
- [ ] Ensure model and text encoder paths are correct

### During Training

- [ ] Monitor GPU memory usage
- [ ] Check loss convergence regularly
- [ ] Review validation samples periodically
- [ ] Save checkpoints frequently

### After Training

- [ ] Test trained model with diverse prompts
- [ ] Document training parameters and results
- [ ] Archive training data and configs

## 🆘 Getting Help

If you're still experiencing issues:

1. **Check logs:** Review console output for error details
2. **Search issues:** Look through GitHub issues for similar problems
3. **Provide details:** When reporting issues, include:
   - Hardware specifications (GPU model, VRAM)
   - Configuration file used
   - Complete error message
   - Steps to reproduce the issue

---

## 🤝 Join the Community

Have questions, want to share your results, or need real-time help?
Join our [community Discord server](https://discord.gg/2mafsHjJ) to connect with other users and the development team!

- Get troubleshooting help
- Share your training results and workflows
- Stay up to date with announcements and updates

We look forward to seeing you there!