# Training Guide

This guide covers how to run training jobs, from basic single-GPU training to advanced distributed setups and automatic
model uploads.

## ⚡ Basic Training (Single GPU)

After preprocessing your dataset and preparing a configuration file, you can start training using the trainer script:

```bash
uv run python scripts/train.py configs/ltx2_av_lora.yaml
```

The trainer will:

1. **Load your configuration** and validate all parameters
2. **Initialize models** and apply optimizations
3. **Run the training loop** with progress tracking
4. **Generate validation videos** (if configured)
5. **Save the trained weights** in your output directory

### Output Files

**For LoRA training:**

- `lora_weights.safetensors` - Main LoRA weights file
- `training_config.yaml` - Copy of training configuration
- `validation_samples/` - Generated validation videos (if enabled)

**For full model fine-tuning:**

- `model_weights.safetensors` - Full model weights
- `training_config.yaml` - Copy of training configuration
- `validation_samples/` - Generated validation videos (if enabled)

## 🖥️ Distributed / Multi-GPU Training

We use Hugging Face 🤗 [Accelerate](https://huggingface.co/docs/accelerate/index) for multi-GPU DDP and FSDP.

### Configure Accelerate

Run the interactive wizard once to set up your environment (DDP / FSDP, GPU count, etc.):

```bash
uv run accelerate config
```

This stores your preferences in `~/.cache/huggingface/accelerate/default_config.yaml`.

### Use the Provided Accelerate Configs (Recommended)

We include ready-to-use Accelerate config files in `configs/accelerate/`:

- [ddp.yaml](../configs/accelerate/ddp.yaml) — Standard DDP
- [ddp_compile.yaml](../configs/accelerate/ddp_compile.yaml) — DDP with `torch.compile` (Inductor)
- [fsdp.yaml](../configs/accelerate/fsdp.yaml) — Standard FSDP (auto-wraps `BasicAVTransformerBlock`)
- [fsdp_compile.yaml](../configs/accelerate/fsdp_compile.yaml) — FSDP with `torch.compile` (Inductor)

Launch with a specific config using `--config_file`:

```bash
# DDP (2 GPUs shown as example)
CUDA_VISIBLE_DEVICES=0,1 \
uv run accelerate launch --config_file configs/accelerate/ddp.yaml \
  scripts/train.py configs/ltx2_av_lora.yaml

# DDP + torch.compile
CUDA_VISIBLE_DEVICES=0,1 \
uv run accelerate launch --config_file configs/accelerate/ddp_compile.yaml \
  scripts/train.py configs/ltx2_av_lora.yaml

# FSDP (4 GPUs shown as example)
CUDA_VISIBLE_DEVICES=0,1,2,3 \
uv run accelerate launch --config_file configs/accelerate/fsdp.yaml \
  scripts/train.py configs/ltx2_av_lora.yaml

# FSDP + torch.compile
CUDA_VISIBLE_DEVICES=0,1,2,3 \
uv run accelerate launch --config_file configs/accelerate/fsdp_compile.yaml \
  scripts/train.py configs/ltx2_av_lora.yaml
```

**Notes:**

- The number of processes is taken from the Accelerate config (`num_processes`). Override with `--num_processes X` or
  restrict GPUs with `CUDA_VISIBLE_DEVICES`.
- The compile variants enable `torch.compile` with the Inductor backend via Accelerate's `dynamo_config`.
- FSDP configs auto-wrap the transformer blocks (`fsdp_transformer_layer_cls_to_wrap: BasicAVTransformerBlock`).

### Launch with Your Default Accelerate Config

If you prefer to use your default Accelerate profile:

```bash
# Use settings from your default accelerate config
uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml

# Override number of processes on the fly (e.g., 2 GPUs)
uv run accelerate launch --num_processes 2 scripts/train.py configs/ltx2_av_lora.yaml

# Select specific GPUs
CUDA_VISIBLE_DEVICES=0,1 uv run accelerate launch scripts/train.py configs/ltx2_av_lora.yaml
```

> [!TIP]
> You can disable the in-terminal progress bars with `--disable_progress_bars` flag in the trainer CLI if desired.

### Benefits of Distributed Training

- **Faster training**: Distribute workload across multiple GPUs
- **Larger effective batch sizes**: Combine gradients from multiple GPUs
- **Memory efficiency**: Each GPU handles a portion of the batch

> [!NOTE]
> Distributed training requires that all GPUs have sufficient memory for the model and batch size. The effective batch
> size becomes `batch_size × num_processes`.

## 🤗 Pushing Models to Hugging Face Hub

You can automatically push your trained models to the Hugging Face Hub by adding the following to your configuration:

```yaml
hub:
  push_to_hub: true
  hub_model_id: "your-username/your-model-name"
```

### Prerequisites

Before pushing, make sure you:

1. **Have a Hugging Face account** - Sign up at [huggingface.co](https://huggingface.co)
2. **Are logged in** via `huggingface-cli login` or have set the `HUGGING_FACE_HUB_TOKEN` environment variable
3. **Have write access** to the specified repository (it will be created if it doesn't exist)

### Login Options

**Option 1: Interactive login**

```bash
uv run huggingface-cli login
```

**Option 2: Environment variable**

```bash
export HUGGING_FACE_HUB_TOKEN="your_token_here"
```

### What Gets Uploaded

The trainer will automatically:

- **Create a model card** with training details and sample outputs
- **Upload model weights**
- **Push sample videos as GIFs** in the model card
- **Include training configuration and prompts**

## 📊 Weights & Biases Logging

Enable experiment tracking with W&B by adding to your configuration:

```yaml
wandb:
  enabled: true
  project: "ltx-2-trainer"
  entity: null  # Your W&B username or team
  tags: [ "ltx2", "lora" ]
  log_validation_videos: true
```

This will log:

- Training loss and learning rate
- Validation videos
- Model configuration
- Training progress

## 🚀 Next Steps

After training completes:

- **Run inference with your trained LoRA** - The [`ltx-pipelines`](../../ltx-pipelines/) package provides production-ready inference
  pipelines that support loading custom LoRAs. Available pipelines include text-to-video, image-to-video,
  IC-LoRA video-to-video, and more. See the [`ltx-pipelines`](../../ltx-pipelines/) package for usage details.
- **Test your model** with validation prompts
- **Iterate and improve** based on validation results
- **Share your results** by pushing to Hugging Face Hub

## 💡 Tips for Successful Training

- **Start small**: Begin with a small dataset and a few hundred steps to verify everything works
- **Monitor validation**: Keep an eye on validation samples to catch overfitting
- **Adjust learning rate**: Lower learning rates often produce better results
- **Use gradient checkpointing**: Essential for training with limited GPU memory
- **Save checkpoints**: Regular checkpoints help recover from interruptions

## Need Help?

If you encounter issues during training, see the [Troubleshooting Guide](troubleshooting.md).

Join our [Discord community](https://discord.gg/2mafsHjJ) for real-time help!