trainer / README.md
Aloukik21's picture
Update README with full documentation
17236cc verified
|
raw
history blame
2.52 kB
metadata
license: mit
tags:
  - lora
  - training
  - runpod
  - ai-toolkit

AI Trainer - RunPod Serverless

Single-endpoint multi-model LoRA training service using ai-toolkit.

Automatically cleans up GPU memory when switching between different models.

Supported Models

Model Key Description Base Model
wan21_1b Wan2.1 1.3B Video Wan-AI/Wan2.1-T2V-1.3B-Diffusers
wan21_14b Wan2.1 14B Video Wan-AI/Wan2.1-T2V-14B-Diffusers
wan22_14b Wan2.2 14B Video ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16
qwen_image Qwen Image Gen Qwen/Qwen-Image
qwen_image_edit Qwen Image Edit Qwen/Qwen-Image-Edit
flux_dev FLUX.1 Dev black-forest-labs/FLUX.1-dev
flux_schnell FLUX.1 Schnell black-forest-labs/FLUX.1-schnell

API Usage

List Models

{"input": {"action": "list_models"}}

Check Status

{"input": {"action": "status"}}

Manual Cleanup

{"input": {"action": "cleanup"}}

Train LoRA

{
  "input": {
    "action": "train",
    "model": "flux_dev",
    "params": {
      "dataset_path": "/workspace/dataset",
      "output_path": "/workspace/output",
      "steps": 1000,
      "batch_size": 1,
      "learning_rate": 1e-4,
      "lora_rank": 16
    }
  }
}

Training Parameters

Parameter Description Default
dataset_path Path to training images /workspace/dataset
output_path Output directory /workspace/output
steps Training steps 2000
batch_size Batch size 1
learning_rate Learning rate 1e-4
lora_rank LoRA rank 16-32
save_every Save checkpoint interval 250
sample_every Sample generation interval 250
trigger_word Trigger word for training None

RunPod Deployment

Environment Variables

  • HF_TOKEN: HuggingFace token for gated models (required for FLUX, Qwen)

Model Caching

Models are cached at /runpod-volume/huggingface-cache/hub/ for faster subsequent loads.

For optimal cold starts, set the RunPod Model field to one of:

  • black-forest-labs/FLUX.1-dev (for FLUX training)
  • ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16 (for Wan 2.2)
  • Qwen/Qwen-Image (for Qwen Image)

Auto-Cleanup

The handler automatically cleans up GPU memory when switching between models:

  • Full cleanup when changing model types
  • Light cleanup for same model
  • Manual cleanup via cleanup action