ltx-2

Paused

File size: 7,201 Bytes

ebfc6b3

# Troubleshooting Guide

This guide covers common issues and solutions when training with the LTX-2 trainer.

## 🔧 VRAM and Memory Issues

Memory management is crucial for successful training with LTX-2.

### Memory Optimization Techniques

#### 1. Enable Gradient Checkpointing

Gradient checkpointing trades training speed for memory savings. **Highly recommended** for most training runs:

```yaml
optimization:
  enable_gradient_checkpointing: true
```

#### 2. Enable 8-bit Text Encoder

Load the Gemma text encoder in 8-bit precision to save GPU memory:

```yaml
acceleration:
  load_text_encoder_in_8bit: true
```

#### 3. Reduce Batch Size

Lower the batch size if you encounter out-of-memory errors:

```yaml
optimization:
  batch_size: 1  # Start with 1 and increase gradually
```

Use gradient accumulation to maintain a larger effective batch size:

```yaml
optimization:
  batch_size: 1
  gradient_accumulation_steps: 4  # Effective batch size = 4
```

#### 4. Use Lower Resolution

Reduce spatial or temporal dimensions to save memory:

```bash
# Smaller spatial resolution
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "512x512x49" \
    --model-path /path/to/model.safetensors \
    --text-encoder-path /path/to/gemma

# Fewer frames
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "960x544x25" \
    --model-path /path/to/model.safetensors \
    --text-encoder-path /path/to/gemma
```

#### 5. Enable Model Quantization

Use quantization to reduce memory usage:

```yaml
acceleration:
  quantization: "int8-quanto"  # Options: int8-quanto, int4-quanto, fp8-quanto
```

#### 6. Use 8-bit Optimizer

The 8-bit AdamW optimizer uses less memory:

```yaml
optimization:
  optimizer_type: "adamw8bit"
```

---

## ⚠️ Common Usage Issues

### Issue: "No module named 'ltx_trainer'" Error

**Solution:**
Ensure you've installed the dependencies and are using `uv run` to execute scripts:

```bash
# From the repository root
uv sync
cd packages/ltx-trainer
uv run python scripts/train.py configs/ltx2_av_lora.yaml
```

> [!TIP]
> Always use `uv run` to execute Python scripts. This automatically uses the correct virtual environment
> without requiring manual activation.

### Issue: "Gemma model path is not a directory" Error

**Solution:**
The `text_encoder_path` must point to a directory containing the Gemma model, not a file:

```yaml
model:
  model_path: "/path/to/ltx-2-model.safetensors"  # File path
  text_encoder_path: "/path/to/gemma-model/"      # Directory path
```

### Issue: "Model path does not exist" Error

**Solution:**
LTX-2 requires local model paths. URLs are not supported:

```yaml
# ✅ Correct - local path
model:
  model_path: "/path/to/ltx-2-model.safetensors"

# ❌ Wrong - URL not supported
model:
  model_path: "https://huggingface.co/..."
```

### Issue: "Frames must satisfy frames % 8 == 1" Error

**Solution:**
LTX-2 requires the number of frames to satisfy `frames % 8 == 1`:

- ✅ Valid: 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 121
- ❌ Invalid: 24, 32, 48, 64, 100

### Issue: Slow Training Speed

**Optimizations:**

1. **Disable gradient checkpointing** (if you have enough VRAM):

   ```yaml
   optimization:
     enable_gradient_checkpointing: false
   ```


2. **Use torch.compile** via Accelerate:

   ```bash
   uv run accelerate launch --config_file configs/accelerate/ddp_compile.yaml \
     scripts/train.py configs/ltx2_av_lora.yaml
   ```

### Issue: Poor Quality Validation Outputs

**Solutions:**

1. **Use Image-to-Video Validation:**
   For more reliable validation, use image-to-video (first-frame conditioning) rather than pure text-to-video:

   ```yaml
   validation:
     prompts:
       - "a professional portrait video of a person"
     images:
       - "/path/to/first_frame.png"  # One image per prompt
   ```

2. **Increase inference steps:**

   ```yaml
   validation:
     inference_steps: 50  # Default is 30
   ```

3. **Adjust guidance settings:**

   ```yaml
   validation:
     guidance_scale: 3.0  # CFG scale (recommended: 3.0)
     stg_scale: 1.0       # STG scale for temporal coherence (recommended: 1.0)
     stg_blocks: [29]     # Transformer block to perturb
   ```

4. **Check caption quality:**
   Review and manually edit captions for accuracy if using auto-generated captions.
   LTX-2 prefers long, detailed captions that describe both visual content and audio (e.g., ambient sounds, speech, music).

5. **Check target modules:**
   Ensure your `target_modules` configuration matches your training goals. For audio-video training,
   use patterns that match both branches (e.g., `"to_k"` instead of `"attn1.to_k"`).
   See [Understanding Target Modules](configuration-reference.md#understanding-target-modules) for details.

6. **Adjust LoRA rank:**
   Try higher values for more capacity:

   ```yaml
   lora:
     rank: 64  # Or 128 for more capacity
   ```

7. **Increase training steps:**

   ```yaml
   optimization:
     steps: 3000
   ```

---

## 🔍 Debugging Tools

### Monitor GPU Memory Usage

Track memory usage during training:

```bash
# Watch GPU memory in real-time
watch -n 1 nvidia-smi

# Log memory usage to file
nvidia-smi --query-gpu=memory.used,memory.total --format=csv --loop=5 > memory_log.csv
```

### Verify Preprocessed Data

Decode latents to visualize the preprocessed videos:

```bash
uv run python scripts/decode_latents.py dataset/.precomputed/latents debug_output \
    --model-path /path/to/model.safetensors
```

To also decode audio latents, add the `--with-audio` flag:

```bash
uv run python scripts/decode_latents.py dataset/.precomputed/latents debug_output \
    --model-path /path/to/model.safetensors \
    --with-audio
```

Compare decoded videos and audio with originals to ensure quality.

---

## 💡 Best Practices

### Before Training

- [ ] Test preprocessing with a small subset first
- [ ] Verify all video files are accessible
- [ ] Check available GPU memory
- [ ] Review configuration against hardware capabilities
- [ ] Ensure model and text encoder paths are correct

### During Training

- [ ] Monitor GPU memory usage
- [ ] Check loss convergence regularly
- [ ] Review validation samples periodically
- [ ] Save checkpoints frequently

### After Training

- [ ] Test trained model with diverse prompts
- [ ] Document training parameters and results
- [ ] Archive training data and configs

## 🆘 Getting Help

If you're still experiencing issues:

1. **Check logs:** Review console output for error details
2. **Search issues:** Look through GitHub issues for similar problems
3. **Provide details:** When reporting issues, include:
   - Hardware specifications (GPU model, VRAM)
   - Configuration file used
   - Complete error message
   - Steps to reproduce the issue

---

## 🤝 Join the Community

Have questions, want to share your results, or need real-time help?
Join our [community Discord server](https://discord.gg/2mafsHjJ) to connect with other users and the development team!

- Get troubleshooting help
- Share your training results and workflows
- Stay up to date with announcements and updates

We look forward to seeing you there!