Spaces:
Runtime error
Runtime error
Hajime MATSUMOTO
commited on
Commit
·
b491772
1
Parent(s):
ce66137
Use single GPU with larger batch size for L40S 48GB
Browse files- Dockerfile +2 -3
- train.py +4 -4
- train_multi_gpu.py +5 -5
Dockerfile
CHANGED
|
@@ -21,6 +21,5 @@ ENV HF_TOKEN=""
|
|
| 21 |
ENV TRANSFORMERS_CACHE=/app/cache
|
| 22 |
ENV HF_HOME=/app/cache
|
| 23 |
|
| 24 |
-
#
|
| 25 |
-
|
| 26 |
-
CMD ["accelerate", "launch", "--num_processes", "4", "train_multi_gpu.py"]
|
|
|
|
| 21 |
ENV TRANSFORMERS_CACHE=/app/cache
|
| 22 |
ENV HF_HOME=/app/cache
|
| 23 |
|
| 24 |
+
# シングルGPU学習 (L40S 48GB)
|
| 25 |
+
CMD ["python", "train.py"]
|
|
|
train.py
CHANGED
|
@@ -234,10 +234,10 @@ training_args = TrainingArguments(
|
|
| 234 |
num_train_epochs=2,
|
| 235 |
max_steps=-1, # -1 = エポックベース
|
| 236 |
|
| 237 |
-
# バッチサイズ (
|
| 238 |
-
per_device_train_batch_size=
|
| 239 |
-
per_device_eval_batch_size=
|
| 240 |
-
gradient_accumulation_steps=
|
| 241 |
|
| 242 |
# 学習率
|
| 243 |
learning_rate=1e-4,
|
|
|
|
| 234 |
num_train_epochs=2,
|
| 235 |
max_steps=-1, # -1 = エポックベース
|
| 236 |
|
| 237 |
+
# バッチサイズ (L40S 48GBなら大きく取れる)
|
| 238 |
+
per_device_train_batch_size=8,
|
| 239 |
+
per_device_eval_batch_size=8,
|
| 240 |
+
gradient_accumulation_steps=4, # 有効バッチサイズ: 8*4=32
|
| 241 |
|
| 242 |
# 学習率
|
| 243 |
learning_rate=1e-4,
|
train_multi_gpu.py
CHANGED
|
@@ -1,7 +1,7 @@
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
Qwen2.5-7B + glaive-function-calling-v2 QLoRA学習スクリプト
|
| 4 |
-
マルチGPU対応版 (
|
| 5 |
|
| 6 |
実行方法:
|
| 7 |
accelerate launch --num_processes 4 train_multi_gpu.py
|
|
@@ -195,10 +195,10 @@ training_args = TrainingArguments(
|
|
| 195 |
|
| 196 |
num_train_epochs=2,
|
| 197 |
|
| 198 |
-
# マルチGPU:
|
| 199 |
-
per_device_train_batch_size=
|
| 200 |
-
per_device_eval_batch_size=
|
| 201 |
-
gradient_accumulation_steps=
|
| 202 |
|
| 203 |
learning_rate=1e-4,
|
| 204 |
weight_decay=0.01,
|
|
|
|
| 1 |
#!/usr/bin/env python3
|
| 2 |
"""
|
| 3 |
Qwen2.5-7B + glaive-function-calling-v2 QLoRA学習スクリプト
|
| 4 |
+
マルチGPU対応版 (4xL40S等)
|
| 5 |
|
| 6 |
実行方法:
|
| 7 |
accelerate launch --num_processes 4 train_multi_gpu.py
|
|
|
|
| 195 |
|
| 196 |
num_train_epochs=2,
|
| 197 |
|
| 198 |
+
# マルチGPU: L40Sは48GB VRAMなのでバッチサイズを上げる
|
| 199 |
+
per_device_train_batch_size=8, # 1GPUあたり8 (L40S 48GB)
|
| 200 |
+
per_device_eval_batch_size=8,
|
| 201 |
+
gradient_accumulation_steps=2, # 有効バッチ: 8*2*4=64
|
| 202 |
|
| 203 |
learning_rate=1e-4,
|
| 204 |
weight_decay=0.01,
|