trainer / README.md

Aloukik21

Update README with full documentation

17236cc verified 3 months ago

2.52 kB

license: mit
tags:
  - lora
  - training
  - runpod
  - ai-toolkit

AI Trainer - RunPod Serverless

Single-endpoint multi-model LoRA training service using ai-toolkit.

Automatically cleans up GPU memory when switching between different models.

Supported Models

Model Key	Description	Base Model
wan21_1b	Wan2.1 1.3B Video	Wan-AI/Wan2.1-T2V-1.3B-Diffusers
wan21_14b	Wan2.1 14B Video	Wan-AI/Wan2.1-T2V-14B-Diffusers
wan22_14b	Wan2.2 14B Video	ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16
qwen_image	Qwen Image Gen	Qwen/Qwen-Image
qwen_image_edit	Qwen Image Edit	Qwen/Qwen-Image-Edit
flux_dev	FLUX.1 Dev	black-forest-labs/FLUX.1-dev
flux_schnell	FLUX.1 Schnell	black-forest-labs/FLUX.1-schnell

API Usage

List Models

{"input": {"action": "list_models"}}

Check Status

{"input": {"action": "status"}}

Manual Cleanup

{"input": {"action": "cleanup"}}

Train LoRA

{
  "input": {
    "action": "train",
    "model": "flux_dev",
    "params": {
      "dataset_path": "/workspace/dataset",
      "output_path": "/workspace/output",
      "steps": 1000,
      "batch_size": 1,
      "learning_rate": 1e-4,
      "lora_rank": 16
    }
  }
}

Training Parameters

Parameter	Description	Default
dataset_path	Path to training images	/workspace/dataset
output_path	Output directory	/workspace/output
steps	Training steps	2000
batch_size	Batch size	1
learning_rate	Learning rate	1e-4
lora_rank	LoRA rank	16-32
save_every	Save checkpoint interval	250
sample_every	Sample generation interval	250
trigger_word	Trigger word for training	None

RunPod Deployment

Environment Variables

HF_TOKEN: HuggingFace token for gated models (required for FLUX, Qwen)

Model Caching

Models are cached at /runpod-volume/huggingface-cache/hub/ for faster subsequent loads.

For optimal cold starts, set the RunPod Model field to one of:

black-forest-labs/FLUX.1-dev (for FLUX training)
ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16 (for Wan 2.2)
Qwen/Qwen-Image (for Qwen Image)

Auto-Cleanup

The handler automatically cleans up GPU memory when switching between models:

Full cleanup when changing model types
Light cleanup for same model
Manual cleanup via cleanup action