Spaces:
Running
on
Zero
A newer version of the Gradio SDK is available:
6.3.0
Training Guide
This guide covers how to run training jobs, from basic single-GPU training to advanced distributed setups and automatic model uploads.
β‘ Basic Training (Single GPU)
After preprocessing your dataset and preparing a configuration file, you can start training using the trainer script:
uv run python scripts/train.py configs/ltx2_av_lora.yaml
The trainer will:
- Load your configuration and validate all parameters
- Initialize models and apply optimizations
- Run the training loop with progress tracking
- Generate validation videos (if configured)
- Save the trained weights in your output directory
Output Files
For LoRA training:
lora_weights.safetensors- Main LoRA weights filetraining_config.yaml- Copy of training configurationvalidation_samples/- Generated validation videos (if enabled)
For full model fine-tuning:
model_weights.safetensors- Full model weightstraining_config.yaml- Copy of training configurationvalidation_samples/- Generated validation videos (if enabled)
π₯οΈ Distributed / Multi-GPU Training
We use Hugging Face π€ Accelerate for multi-GPU DDP and FSDP.
Configure Accelerate
Run the interactive wizard once to set up your environment (DDP / FSDP, GPU count, etc.):
uv run accelerate config
This stores your preferences in ~/.cache/huggingface/accelerate/default_config.yaml.
Use the Provided Accelerate Configs (Recommended)
We include ready-to-use Accelerate config files in configs/accelerate/:
- ddp.yaml β Standard DDP
- ddp_compile.yaml β DDP with
torch.compile(Inductor) - fsdp.yaml β Standard FSDP (auto-wraps
BasicAVTransformerBlock) - fsdp_compile.yaml β FSDP with
torch.compile(Inductor)
Launch with a specific config using --config_file:
# DDP (2 GPUs shown as example)
CUDA_VISIBLE_DEVICES=0,1 \
uv run accelerate launch --config_file configs/accelerate/ddp.yaml \
scripts/train.py configs/ltx2_av_lora.yaml
# DDP + torch.compile
CUDA_VISIBLE_DEVICES=0,1 \
uv run accelerate launch --config_file configs/accelerate/ddp_compile.yaml \
scripts/train.py configs/ltx2_av_lora.yaml
# FSDP (4 GPUs shown as example)
CUDA_VISIBLE_DEVICES=0,1,2,3 \
uv run accelerate launch --config_file configs/accelerate/fsdp.yaml \
scripts/train.py configs/ltx2_av_lora.yaml
# FSDP + torch.compile
CUDA_VISIBLE_DEVICES=0,1,2,3 \
uv run accelerate launch --config_file configs/accelerate/fsdp_compile.yaml \
scripts/train.py configs/ltx2_av_lora.yaml
Notes:
- The number of processes is taken from the Accelerate config (
num_processes). Override with--num_processes Xor restrict GPUs withCUDA_VISIBLE_DEVICES. - The compile variants enable
torch.compilewith the Inductor backend via Accelerate'sdynamo_config. - FSDP configs auto-wrap the transformer blocks (
fsdp_transformer_layer_cls_to_wrap: BasicAVTransformerBlock).
Launch with Your Default Accelerate Config
If you prefer to use your default Accelerate profile:
# Use settings from your default accelerate config
uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml
# Override number of processes on the fly (e.g., 2 GPUs)
uv run accelerate launch --num_processes 2 scripts/train.py configs/ltx2_av_lora.yaml
# Select specific GPUs
CUDA_VISIBLE_DEVICES=0,1 uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml
You can disable the in-terminal progress bars with
--disable_progress_barsflag in the trainer CLI if desired.
Benefits of Distributed Training
- Faster training: Distribute workload across multiple GPUs
- Larger effective batch sizes: Combine gradients from multiple GPUs
- Memory efficiency: Each GPU handles a portion of the batch
Distributed training requires that all GPUs have sufficient memory for the model and batch size. The effective batch size becomes
batch_size Γ num_processes.
π€ Pushing Models to Hugging Face Hub
You can automatically push your trained models to the Hugging Face Hub by adding the following to your configuration:
hub:
push_to_hub: true
hub_model_id: "your-username/your-model-name"
Prerequisites
Before pushing, make sure you:
- Have a Hugging Face account - Sign up at huggingface.co
- Are logged in via
huggingface-cli loginor have set theHUGGING_FACE_HUB_TOKENenvironment variable - Have write access to the specified repository (it will be created if it doesn't exist)
Login Options
Option 1: Interactive login
uv run huggingface-cli login
Option 2: Environment variable
export HUGGING_FACE_HUB_TOKEN="your_token_here"
What Gets Uploaded
The trainer will automatically:
- Create a model card with training details and sample outputs
- Upload model weights
- Push sample videos as GIFs in the model card
- Include training configuration and prompts
π Weights & Biases Logging
Enable experiment tracking with W&B by adding to your configuration:
wandb:
enabled: true
project: "ltx-2-trainer"
entity: null # Your W&B username or team
tags: [ "ltx2", "lora" ]
log_validation_videos: true
This will log:
- Training loss and learning rate
- Validation videos
- Model configuration
- Training progress
π Next Steps
After training completes:
- Run inference with your trained LoRA - The
ltx-pipelinespackage provides production-ready inference pipelines that support loading custom LoRAs. Available pipelines include text-to-video, image-to-video, IC-LoRA video-to-video, and more. See theltx-pipelinespackage for usage details. - Test your model with validation prompts
- Iterate and improve based on validation results
- Share your results by pushing to Hugging Face Hub
π‘ Tips for Successful Training
- Start small: Begin with a small dataset and a few hundred steps to verify everything works
- Monitor validation: Keep an eye on validation samples to catch overfitting
- Adjust learning rate: Lower learning rates often produce better results
- Use gradient checkpointing: Essential for training with limited GPU memory
- Save checkpoints: Regular checkpoints help recover from interruptions
Need Help?
If you encounter issues during training, see the Troubleshooting Guide.
Join our Discord community for real-time help!