Aloukik21
/

trainer

@@ -9,21 +9,24 @@ tags:
 # AI Trainer - RunPod Serverless
-Single-endpoint multi-model LoRA training service using [ai-toolkit](https://github.com/ostris/ai-toolkit).
-Automatically cleans up GPU memory when switching between different models.
-## Supported Models
-| Model Key | Description | Base Model |
-|-----------|-------------|------------|
-| wan21_1b | Wan2.1 1.3B Video | Wan-AI/Wan2.1-T2V-1.3B-Diffusers |
-| wan21_14b | Wan2.1 14B Video | Wan-AI/Wan2.1-T2V-14B-Diffusers |
-| wan22_14b | Wan2.2 14B Video | ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16 |
-| qwen_image | Qwen Image Gen | Qwen/Qwen-Image |
-| qwen_image_edit | Qwen Image Edit | Qwen/Qwen-Image-Edit |
-| flux_dev | FLUX.1 Dev | black-forest-labs/FLUX.1-dev |
-| flux_schnell | FLUX.1 Schnell | black-forest-labs/FLUX.1-schnell |
 ## API Usage
@@ -32,16 +35,6 @@ Automatically cleans up GPU memory when switching between different models.
 {"input": {"action": "list_models"}}
 ```
-### Check Status
-```json
-{"input": {"action": "status"}}
-```
-### Manual Cleanup
-```json
-{"input": {"action": "cleanup"}}
-```
 ### Train LoRA
 ```json
 {
@@ -51,45 +44,21 @@ Automatically cleans up GPU memory when switching between different models.
     "params": {
       "dataset_path": "/workspace/dataset",
       "output_path": "/workspace/output",
-      "steps": 1000,
-      "batch_size": 1,
-      "learning_rate": 1e-4,
-      "lora_rank": 16
     }
   }
 }
 ```
-## Training Parameters
-| Parameter | Description | Default |
-|-----------|-------------|---------|
-| dataset_path | Path to training images | /workspace/dataset |
-| output_path | Output directory | /workspace/output |
-| steps | Training steps | 2000 |
-| batch_size | Batch size | 1 |
-| learning_rate | Learning rate | 1e-4 |
-| lora_rank | LoRA rank | 16-32 |
-| save_every | Save checkpoint interval | 250 |
-| sample_every | Sample generation interval | 250 |
-| trigger_word | Trigger word for training | None |
-## RunPod Deployment
-### Environment Variables
-- `HF_TOKEN`: HuggingFace token for gated models (required for FLUX, Qwen)
-### Model Caching
-Models are cached at `/runpod-volume/huggingface-cache/hub/` for faster subsequent loads.
-For optimal cold starts, set the RunPod **Model** field to one of:
-- `black-forest-labs/FLUX.1-dev` (for FLUX training)
-- `ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16` (for Wan 2.2)
-- `Qwen/Qwen-Image` (for Qwen Image)
 ## Auto-Cleanup
-The handler automatically cleans up GPU memory when switching between models:
-- Full cleanup when changing model types
-- Light cleanup for same model
-- Manual cleanup via `cleanup` action

 # AI Trainer - RunPod Serverless
+Single-endpoint multi-model LoRA training with all models cached in this repo.
+## RunPod Deployment
+**Set Model field to:** `Aloukik21/trainer`
+This will cache all models (~240GB) for fast cold starts.
+## Cached Models
+| Model Key | Subfolder | Size |
+|-----------|-----------|------|
+| flux_dev | flux-dev/ | ~54GB |
+| flux_schnell | flux-schnell/ | ~54GB |
+| wan21_14b | wan21-14b/ | ~75GB |
+| wan22_14b | wan22-14b/ | ~53GB |
+| qwen_image | qwen-image/ | ~54GB |
+| accuracy_recovery_adapters | accuracy_recovery_adapters/ | ~3GB |
 ## API Usage
 {"input": {"action": "list_models"}}
 ```
 ### Train LoRA
 ```json
 {
     "params": {
       "dataset_path": "/workspace/dataset",
       "output_path": "/workspace/output",
+      "steps": 1000
     }
   }
 }
 ```
+### Cleanup (between different models)
+```json
+{"input": {"action": "cleanup"}}
+```
+## Environment Variables
+- `HF_TOKEN`: HuggingFace token (required for some gated models)
 ## Auto-Cleanup
+Handler automatically cleans up GPU memory when switching between different model types.