|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- lora |
|
|
- training |
|
|
- runpod |
|
|
- ai-toolkit |
|
|
--- |
|
|
|
|
|
# AI Trainer - RunPod Serverless |
|
|
|
|
|
Single-endpoint multi-model LoRA training with all models cached in this repo. |
|
|
|
|
|
## RunPod Deployment |
|
|
|
|
|
**Set Model field to:** `Aloukik21/trainer` |
|
|
|
|
|
This will cache all models (~240GB) for fast cold starts. |
|
|
|
|
|
## Cached Models |
|
|
|
|
|
| Model Key | Subfolder | Size | |
|
|
|-----------|-----------|------| |
|
|
| flux_dev | flux-dev/ | ~54GB | |
|
|
| flux_schnell | flux-schnell/ | ~54GB | |
|
|
| wan21_14b | wan21-14b/ | ~75GB | |
|
|
| wan22_14b | wan22-14b/ | ~53GB | |
|
|
| qwen_image | qwen-image/ | ~54GB | |
|
|
| accuracy_recovery_adapters | accuracy_recovery_adapters/ | ~3GB | |
|
|
|
|
|
## API Usage |
|
|
|
|
|
### List Models |
|
|
```json |
|
|
{"input": {"action": "list_models"}} |
|
|
``` |
|
|
|
|
|
### Train LoRA |
|
|
```json |
|
|
{ |
|
|
"input": { |
|
|
"action": "train", |
|
|
"model": "flux_dev", |
|
|
"params": { |
|
|
"dataset_path": "/workspace/dataset", |
|
|
"output_path": "/workspace/output", |
|
|
"steps": 1000 |
|
|
} |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
### Cleanup (between different models) |
|
|
```json |
|
|
{"input": {"action": "cleanup"}} |
|
|
``` |
|
|
|
|
|
## Environment Variables |
|
|
|
|
|
- `HF_TOKEN`: HuggingFace token (required for some gated models) |
|
|
|
|
|
## Auto-Cleanup |
|
|
|
|
|
Handler automatically cleans up GPU memory when switching between different model types. |
|
|
|