Spaces:
Running
on
Zero
Running
on
Zero
| # Troubleshooting Guide | |
| This guide covers common issues and solutions when training with the LTX-2 trainer. | |
| ## 🔧 VRAM and Memory Issues | |
| Memory management is crucial for successful training with LTX-2. | |
| ### Memory Optimization Techniques | |
| #### 1. Enable Gradient Checkpointing | |
| Gradient checkpointing trades training speed for memory savings. **Highly recommended** for most training runs: | |
| ```yaml | |
| optimization: | |
| enable_gradient_checkpointing: true | |
| ``` | |
| #### 2. Enable 8-bit Text Encoder | |
| Load the Gemma text encoder in 8-bit precision to save GPU memory: | |
| ```yaml | |
| acceleration: | |
| load_text_encoder_in_8bit: true | |
| ``` | |
| #### 3. Reduce Batch Size | |
| Lower the batch size if you encounter out-of-memory errors: | |
| ```yaml | |
| optimization: | |
| batch_size: 1 # Start with 1 and increase gradually | |
| ``` | |
| Use gradient accumulation to maintain a larger effective batch size: | |
| ```yaml | |
| optimization: | |
| batch_size: 1 | |
| gradient_accumulation_steps: 4 # Effective batch size = 4 | |
| ``` | |
| #### 4. Use Lower Resolution | |
| Reduce spatial or temporal dimensions to save memory: | |
| ```bash | |
| # Smaller spatial resolution | |
| uv run python scripts/process_dataset.py dataset.json \ | |
| --resolution-buckets "512x512x49" \ | |
| --model-path /path/to/model.safetensors \ | |
| --text-encoder-path /path/to/gemma | |
| # Fewer frames | |
| uv run python scripts/process_dataset.py dataset.json \ | |
| --resolution-buckets "960x544x25" \ | |
| --model-path /path/to/model.safetensors \ | |
| --text-encoder-path /path/to/gemma | |
| ``` | |
| #### 5. Enable Model Quantization | |
| Use quantization to reduce memory usage: | |
| ```yaml | |
| acceleration: | |
| quantization: "int8-quanto" # Options: int8-quanto, int4-quanto, fp8-quanto | |
| ``` | |
| #### 6. Use 8-bit Optimizer | |
| The 8-bit AdamW optimizer uses less memory: | |
| ```yaml | |
| optimization: | |
| optimizer_type: "adamw8bit" | |
| ``` | |
| --- | |
| ## ⚠️ Common Usage Issues | |
| ### Issue: "No module named 'ltx_trainer'" Error | |
| **Solution:** | |
| Ensure you've installed the dependencies and are using `uv run` to execute scripts: | |
| ```bash | |
| # From the repository root | |
| uv sync | |
| cd packages/ltx-trainer | |
| uv run python scripts/train.py configs/ltx2_av_lora.yaml | |
| ``` | |
| > [!TIP] | |
| > Always use `uv run` to execute Python scripts. This automatically uses the correct virtual environment | |
| > without requiring manual activation. | |
| ### Issue: "Gemma model path is not a directory" Error | |
| **Solution:** | |
| The `text_encoder_path` must point to a directory containing the Gemma model, not a file: | |
| ```yaml | |
| model: | |
| model_path: "/path/to/ltx-2-model.safetensors" # File path | |
| text_encoder_path: "/path/to/gemma-model/" # Directory path | |
| ``` | |
| ### Issue: "Model path does not exist" Error | |
| **Solution:** | |
| LTX-2 requires local model paths. URLs are not supported: | |
| ```yaml | |
| # ✅ Correct - local path | |
| model: | |
| model_path: "/path/to/ltx-2-model.safetensors" | |
| # ❌ Wrong - URL not supported | |
| model: | |
| model_path: "https://huggingface.co/..." | |
| ``` | |
| ### Issue: "Frames must satisfy frames % 8 == 1" Error | |
| **Solution:** | |
| LTX-2 requires the number of frames to satisfy `frames % 8 == 1`: | |
| - ✅ Valid: 1, 9, 17, 25, 33, 41, 49, 57, 65, 73, 81, 89, 97, 121 | |
| - ❌ Invalid: 24, 32, 48, 64, 100 | |
| ### Issue: Slow Training Speed | |
| **Optimizations:** | |
| 1. **Disable gradient checkpointing** (if you have enough VRAM): | |
| ```yaml | |
| optimization: | |
| enable_gradient_checkpointing: false | |
| ``` | |
| 2. **Use torch.compile** via Accelerate: | |
| ```bash | |
| uv run accelerate launch --config_file configs/accelerate/ddp_compile.yaml \ | |
| scripts/train.py configs/ltx2_av_lora.yaml | |
| ``` | |
| ### Issue: Poor Quality Validation Outputs | |
| **Solutions:** | |
| 1. **Use Image-to-Video Validation:** | |
| For more reliable validation, use image-to-video (first-frame conditioning) rather than pure text-to-video: | |
| ```yaml | |
| validation: | |
| prompts: | |
| - "a professional portrait video of a person" | |
| images: | |
| - "/path/to/first_frame.png" # One image per prompt | |
| ``` | |
| 2. **Increase inference steps:** | |
| ```yaml | |
| validation: | |
| inference_steps: 50 # Default is 30 | |
| ``` | |
| 3. **Adjust guidance settings:** | |
| ```yaml | |
| validation: | |
| guidance_scale: 3.0 # CFG scale (recommended: 3.0) | |
| stg_scale: 1.0 # STG scale for temporal coherence (recommended: 1.0) | |
| stg_blocks: [29] # Transformer block to perturb | |
| ``` | |
| 4. **Check caption quality:** | |
| Review and manually edit captions for accuracy if using auto-generated captions. | |
| LTX-2 prefers long, detailed captions that describe both visual content and audio (e.g., ambient sounds, speech, music). | |
| 5. **Check target modules:** | |
| Ensure your `target_modules` configuration matches your training goals. For audio-video training, | |
| use patterns that match both branches (e.g., `"to_k"` instead of `"attn1.to_k"`). | |
| See [Understanding Target Modules](configuration-reference.md#understanding-target-modules) for details. | |
| 6. **Adjust LoRA rank:** | |
| Try higher values for more capacity: | |
| ```yaml | |
| lora: | |
| rank: 64 # Or 128 for more capacity | |
| ``` | |
| 7. **Increase training steps:** | |
| ```yaml | |
| optimization: | |
| steps: 3000 | |
| ``` | |
| --- | |
| ## 🔍 Debugging Tools | |
| ### Monitor GPU Memory Usage | |
| Track memory usage during training: | |
| ```bash | |
| # Watch GPU memory in real-time | |
| watch -n 1 nvidia-smi | |
| # Log memory usage to file | |
| nvidia-smi --query-gpu=memory.used,memory.total --format=csv --loop=5 > memory_log.csv | |
| ``` | |
| ### Verify Preprocessed Data | |
| Decode latents to visualize the preprocessed videos: | |
| ```bash | |
| uv run python scripts/decode_latents.py dataset/.precomputed/latents debug_output \ | |
| --model-path /path/to/model.safetensors | |
| ``` | |
| To also decode audio latents, add the `--with-audio` flag: | |
| ```bash | |
| uv run python scripts/decode_latents.py dataset/.precomputed/latents debug_output \ | |
| --model-path /path/to/model.safetensors \ | |
| --with-audio | |
| ``` | |
| Compare decoded videos and audio with originals to ensure quality. | |
| --- | |
| ## 💡 Best Practices | |
| ### Before Training | |
| - [ ] Test preprocessing with a small subset first | |
| - [ ] Verify all video files are accessible | |
| - [ ] Check available GPU memory | |
| - [ ] Review configuration against hardware capabilities | |
| - [ ] Ensure model and text encoder paths are correct | |
| ### During Training | |
| - [ ] Monitor GPU memory usage | |
| - [ ] Check loss convergence regularly | |
| - [ ] Review validation samples periodically | |
| - [ ] Save checkpoints frequently | |
| ### After Training | |
| - [ ] Test trained model with diverse prompts | |
| - [ ] Document training parameters and results | |
| - [ ] Archive training data and configs | |
| ## 🆘 Getting Help | |
| If you're still experiencing issues: | |
| 1. **Check logs:** Review console output for error details | |
| 2. **Search issues:** Look through GitHub issues for similar problems | |
| 3. **Provide details:** When reporting issues, include: | |
| - Hardware specifications (GPU model, VRAM) | |
| - Configuration file used | |
| - Complete error message | |
| - Steps to reproduce the issue | |
| --- | |
| ## 🤝 Join the Community | |
| Have questions, want to share your results, or need real-time help? | |
| Join our [community Discord server](https://discord.gg/2mafsHjJ) to connect with other users and the development team! | |
| - Get troubleshooting help | |
| - Share your training results and workflows | |
| - Stay up to date with announcements and updates | |
| We look forward to seeing you there! | |