Aloukik21 commited on
Commit
54c3d8d
·
verified ·
1 Parent(s): b31917b

Update README for cached models

Browse files
Files changed (1) hide show
  1. README.md +23 -54
README.md CHANGED
@@ -9,21 +9,24 @@ tags:
9
 
10
  # AI Trainer - RunPod Serverless
11
 
12
- Single-endpoint multi-model LoRA training service using [ai-toolkit](https://github.com/ostris/ai-toolkit).
13
 
14
- Automatically cleans up GPU memory when switching between different models.
 
 
 
 
15
 
16
- ## Supported Models
17
 
18
- | Model Key | Description | Base Model |
19
- |-----------|-------------|------------|
20
- | wan21_1b | Wan2.1 1.3B Video | Wan-AI/Wan2.1-T2V-1.3B-Diffusers |
21
- | wan21_14b | Wan2.1 14B Video | Wan-AI/Wan2.1-T2V-14B-Diffusers |
22
- | wan22_14b | Wan2.2 14B Video | ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16 |
23
- | qwen_image | Qwen Image Gen | Qwen/Qwen-Image |
24
- | qwen_image_edit | Qwen Image Edit | Qwen/Qwen-Image-Edit |
25
- | flux_dev | FLUX.1 Dev | black-forest-labs/FLUX.1-dev |
26
- | flux_schnell | FLUX.1 Schnell | black-forest-labs/FLUX.1-schnell |
27
 
28
  ## API Usage
29
 
@@ -32,16 +35,6 @@ Automatically cleans up GPU memory when switching between different models.
32
  {"input": {"action": "list_models"}}
33
  ```
34
 
35
- ### Check Status
36
- ```json
37
- {"input": {"action": "status"}}
38
- ```
39
-
40
- ### Manual Cleanup
41
- ```json
42
- {"input": {"action": "cleanup"}}
43
- ```
44
-
45
  ### Train LoRA
46
  ```json
47
  {
@@ -51,45 +44,21 @@ Automatically cleans up GPU memory when switching between different models.
51
  "params": {
52
  "dataset_path": "/workspace/dataset",
53
  "output_path": "/workspace/output",
54
- "steps": 1000,
55
- "batch_size": 1,
56
- "learning_rate": 1e-4,
57
- "lora_rank": 16
58
  }
59
  }
60
  }
61
  ```
62
 
63
- ## Training Parameters
64
-
65
- | Parameter | Description | Default |
66
- |-----------|-------------|---------|
67
- | dataset_path | Path to training images | /workspace/dataset |
68
- | output_path | Output directory | /workspace/output |
69
- | steps | Training steps | 2000 |
70
- | batch_size | Batch size | 1 |
71
- | learning_rate | Learning rate | 1e-4 |
72
- | lora_rank | LoRA rank | 16-32 |
73
- | save_every | Save checkpoint interval | 250 |
74
- | sample_every | Sample generation interval | 250 |
75
- | trigger_word | Trigger word for training | None |
76
-
77
- ## RunPod Deployment
78
-
79
- ### Environment Variables
80
- - `HF_TOKEN`: HuggingFace token for gated models (required for FLUX, Qwen)
81
 
82
- ### Model Caching
83
- Models are cached at `/runpod-volume/huggingface-cache/hub/` for faster subsequent loads.
84
 
85
- For optimal cold starts, set the RunPod **Model** field to one of:
86
- - `black-forest-labs/FLUX.1-dev` (for FLUX training)
87
- - `ai-toolkit/Wan2.2-T2V-A14B-Diffusers-bf16` (for Wan 2.2)
88
- - `Qwen/Qwen-Image` (for Qwen Image)
89
 
90
  ## Auto-Cleanup
91
 
92
- The handler automatically cleans up GPU memory when switching between models:
93
- - Full cleanup when changing model types
94
- - Light cleanup for same model
95
- - Manual cleanup via `cleanup` action
 
9
 
10
  # AI Trainer - RunPod Serverless
11
 
12
+ Single-endpoint multi-model LoRA training with all models cached in this repo.
13
 
14
+ ## RunPod Deployment
15
+
16
+ **Set Model field to:** `Aloukik21/trainer`
17
+
18
+ This will cache all models (~240GB) for fast cold starts.
19
 
20
+ ## Cached Models
21
 
22
+ | Model Key | Subfolder | Size |
23
+ |-----------|-----------|------|
24
+ | flux_dev | flux-dev/ | ~54GB |
25
+ | flux_schnell | flux-schnell/ | ~54GB |
26
+ | wan21_14b | wan21-14b/ | ~75GB |
27
+ | wan22_14b | wan22-14b/ | ~53GB |
28
+ | qwen_image | qwen-image/ | ~54GB |
29
+ | accuracy_recovery_adapters | accuracy_recovery_adapters/ | ~3GB |
 
30
 
31
  ## API Usage
32
 
 
35
  {"input": {"action": "list_models"}}
36
  ```
37
 
 
 
 
 
 
 
 
 
 
 
38
  ### Train LoRA
39
  ```json
40
  {
 
44
  "params": {
45
  "dataset_path": "/workspace/dataset",
46
  "output_path": "/workspace/output",
47
+ "steps": 1000
 
 
 
48
  }
49
  }
50
  }
51
  ```
52
 
53
+ ### Cleanup (between different models)
54
+ ```json
55
+ {"input": {"action": "cleanup"}}
56
+ ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
 
58
+ ## Environment Variables
 
59
 
60
+ - `HF_TOKEN`: HuggingFace token (required for some gated models)
 
 
 
61
 
62
  ## Auto-Cleanup
63
 
64
+ Handler automatically cleans up GPU memory when switching between different model types.