Spaces:

Lightricks
/

ltx-2

Running on Zero

App Files Files Community

ltx-2 / packages /ltx-trainer /docs /troubleshooting.md

linoy

inital commit

ebfc6b3 15 days ago

preview code

raw

history blame contribute delete

7.2 kB

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

Troubleshooting Guide

This guide covers common issues and solutions when training with the LTX-2 trainer.

🔧 VRAM and Memory Issues

Memory management is crucial for successful training with LTX-2.

Memory Optimization Techniques

1. Enable Gradient Checkpointing

Gradient checkpointing trades training speed for memory savings. Highly recommended for most training runs:

optimization:
  enable_gradient_checkpointing: true

2. Enable 8-bit Text Encoder

Load the Gemma text encoder in 8-bit precision to save GPU memory:

acceleration:
  load_text_encoder_in_8bit: true

3. Reduce Batch Size

Lower the batch size if you encounter out-of-memory errors:

optimization:
  batch_size: 1  # Start with 1 and increase gradually

Use gradient accumulation to maintain a larger effective batch size:

optimization:
  batch_size: 1
  gradient_accumulation_steps: 4  # Effective batch size = 4

4. Use Lower Resolution

Reduce spatial or temporal dimensions to save memory:

# Smaller spatial resolution
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "512x512x49" \
    --model-path /path/to/model.safetensors \
    --text-encoder-path /path/to/gemma

# Fewer frames
uv run python scripts/process_dataset.py dataset.json \
    --resolution-buckets "960x544x25" \
    --model-path /path/to/model.safetensors \
    --text-encoder-path /path/to/gemma

5. Enable Model Quantization

Use quantization to reduce memory usage:

acceleration:
  quantization: "int8-quanto"  # Options: int8-quanto, int4-quanto, fp8-quanto

6. Use 8-bit Optimizer

The 8-bit AdamW optimizer uses less memory:

optimization:
  optimizer_type: "adamw8bit"

⚠️ Common Usage Issues

Issue: "No module named 'ltx_trainer'" Error

Solution: Ensure you've installed the dependencies and are using uv run to execute scripts:

# From the repository root
uv sync
cd packages/ltx-trainer
uv run python scripts/train.py configs/ltx2_av_lora.yaml

Always use uv run to execute Python scripts. This automatically uses the correct virtual environment without requiring manual activation.

Issue: "Gemma model path is not a directory" Error

Solution: The text_encoder_path must point to a directory containing the Gemma model, not a file:

model:
  model_path: "/path/to/ltx-2-model.safetensors"  # File path
  text_encoder_path: "/path/to/gemma-model/"      # Directory path

Issue: "Model path does not exist" Error

Solution: LTX-2 requires local model paths. URLs are not supported:

# ✅ Correct - local path
model:
  model_path: "/path/to/ltx-2-model.safetensors"

# ❌ Wrong - URL not supported
model:
  model_path: "https://huggingface.co/..."

Issue: "Frames must satisfy frames % 8 == 1" Error

Solution: LTX-2 requires the number of frames to satisfy frames % 8 == 1:

✅ Valid: 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 121
❌ Invalid: 24, 32, 48, 64, 100

Issue: Slow Training Speed

Optimizations:

Disable gradient checkpointing (if you have enough VRAM):

optimization:
  enable_gradient_checkpointing: false

Use torch.compile via Accelerate:

uv run accelerate launch --config_file configs/accelerate/ddp_compile.yaml \
  scripts/train.py configs/ltx2_av_lora.yaml

Issue: Poor Quality Validation Outputs

Solutions:

Use Image-to-Video Validation: For more reliable validation, use image-to-video (first-frame conditioning) rather than pure text-to-video:

validation:
  prompts:
    - "a professional portrait video of a person"
  images:
    - "/path/to/first_frame.png"  # One image per prompt

Increase inference steps:

validation:
  inference_steps: 50  # Default is 30

Adjust guidance settings:

validation:
  guidance_scale: 3.0  # CFG scale (recommended: 3.0)
  stg_scale: 1.0       # STG scale for temporal coherence (recommended: 1.0)
  stg_blocks: [29]     # Transformer block to perturb

Check caption quality: Review and manually edit captions for accuracy if using auto-generated captions. LTX-2 prefers long, detailed captions that describe both visual content and audio (e.g., ambient sounds, speech, music).
Check target modules: Ensure your target_modules configuration matches your training goals. For audio-video training, use patterns that match both branches (e.g., "to_k" instead of "attn1.to_k"). See Understanding Target Modules for details.
Adjust LoRA rank: Try higher values for more capacity:
```
lora:
  rank: 64  # Or 128 for more capacity
```
Increase training steps:
```
optimization:
  steps: 3000
```

🔍 Debugging Tools

Monitor GPU Memory Usage

Track memory usage during training:

# Watch GPU memory in real-time
watch -n 1 nvidia-smi

# Log memory usage to file
nvidia-smi --query-gpu=memory.used,memory.total --format=csv --loop=5 > memory_log.csv

Verify Preprocessed Data

Decode latents to visualize the preprocessed videos:

uv run python scripts/decode_latents.py dataset/.precomputed/latents debug_output \
    --model-path /path/to/model.safetensors

To also decode audio latents, add the --with-audio flag:

uv run python scripts/decode_latents.py dataset/.precomputed/latents debug_output \
    --model-path /path/to/model.safetensors \
    --with-audio

Compare decoded videos and audio with originals to ensure quality.

💡 Best Practices

Before Training

Test preprocessing with a small subset first
Verify all video files are accessible
Check available GPU memory
Review configuration against hardware capabilities
Ensure model and text encoder paths are correct

During Training

Monitor GPU memory usage
Check loss convergence regularly
Review validation samples periodically
Save checkpoints frequently

After Training

Test trained model with diverse prompts
Document training parameters and results
Archive training data and configs

🆘 Getting Help

If you're still experiencing issues:

Check logs: Review console output for error details
Search issues: Look through GitHub issues for similar problems
Provide details: When reporting issues, include:
- Hardware specifications (GPU model, VRAM)
- Configuration file used
- Complete error message
- Steps to reproduce the issue

🤝 Join the Community

Have questions, want to share your results, or need real-time help? Join our community Discord server to connect with other users and the development team!

Get troubleshooting help
Share your training results and workflows
Stay up to date with announcements and updates

We look forward to seeing you there!