v7

Browse files

Files changed (8) hide show

improve_gainlora/IDEA_Overall.md +46 -10
improve_gainlora/_patch_genscripts.py +62 -0
improve_gainlora/gen_script_long_order3_t5_specroute.sh +36 -1
improve_gainlora/gen_script_long_order4_t5_specroute.sh +36 -1
improve_gainlora/gen_script_superni_order1_t5_specroute.sh +36 -1
improve_gainlora/gen_script_superni_order2_t5_specroute.sh +36 -1
improve_gainlora/src/cl_trainer_specroute.py +5 -1
improve_gainlora/src/run_t5.py +3 -2

improve_gainlora/IDEA_Overall.md CHANGED Viewed

@@ -147,13 +147,37 @@ $$T_{\max} \;\leq\; \frac{d}{r\,(1 - \varepsilon)}$$
 Với T5-Small ($d = 512$, $r = 8$, $\varepsilon = 0.02$): $T_{\max} \leq 65 \gg 15$ tasks. Điều này kết nối capacity học liên tục với lý thuyết Grassmannian packing.
-### 3.4 Drift Invariance
 **Mệnh đề 1** *(Drift-Free Routing).* Hàm routing $h \mapsto \alpha_t(h)$ hoàn toàn ổn định qua tất cả các task.
 **Chứng minh.** Routing input được tính từ frozen embedding table, *trước* bất kỳ transformer block nào. LoRA chỉ tồn tại trong các attention layer sâu hơn → $h$ độc lập với mọi tham số LoRA. Kết hợp với $\mathcal{S}_t$ đóng băng, $\alpha_t(h)$ bất biến với mọi thay đổi tích luỹ. $\square$
-### 3.5 Vấn đề then chốt: Null-Space Collapse
 Định lý 1 giả định $h \in \mathrm{span}(V_{t^*})$. Trong thực tế điều này đòi hỏi hai điều kiện:
@@ -270,7 +294,7 @@ hay tương đương, các hàng của $A_t$ là $r$ eigenvectors ứng với ei
 ```
 # Bước 1: Thu thập activation covariance (forward pass nhỏ, trước training)
-C_t = ∑ h(x)h(x)^T / N_batch    # covariance input task t (N_batch ~200 batches)
 # Bước 2: Project covariance vào null-space
 Q = I - P_old                   # null-space projector (từ GPM bases đã lưu)
@@ -278,13 +302,22 @@ C_tilde = Q @ C_t @ Q           # projected covariance
 # Bước 3: Eigenvector decomposition
 eigvals, eigvecs = eigh(C_tilde) # đối xứng → eigh nhanh hơn SVD
 top_r_idx = argsort(eigvals, descending=True)[:r]
-# Bước 4: Set A_t
 A_t = eigvecs[:, top_r_idx].T   # shape (r, d) — direction task-relevant nhất trong null-space
 A_t = A_t / norm(A_t, dim=1, keepdim=True) * sqrt(3)  # normalize như InfLoRA gốc
 ```
 #### Ý nghĩa Lý thuyết Thông tin
 Theo Data Processing Inequality, với bất kỳ ma trận $A_t$ nào:
@@ -380,8 +413,8 @@ $A_t$ này đảm bảo capture **variance task-relevant tối đa** trong null-
    - Tính projected covariance: $\tilde{C}_t = Q C_t Q$ ($Q = I - P_{\text{old}}$)
    - Eigenvectors của $\tilde{C}_t$ → khởi tạo $A_t$ (thay thế random Kaiming)
 3. InfLoRA: chuẩn hoá $A_t$ (đã nằm trong null-space từ eigenvector decomposition).
-4. Huấn luyện `lora_B` với spectral affinity routing + adaptive bias $\beta(n)$ + C4.
-5. Sau training: tính $\mathcal{S}_t$ (cả inference routing và storage) + cập nhật GPM bases.
 6. Lưu tất cả artifacts cho task tiếp theo.
 ---
@@ -399,8 +432,8 @@ $A_t$ này đảm bảo capture **variance task-relevant tối đa** trong null-
 | GPM + InfLoRA null-space | `get_reg_matrix()` | `cl_trainer_specroute.py` |
 | Dynamic ESA threshold | `(1−ε₀)·t/T + ε₀` | `cl_trainer_specroute.py` |
 | C4: Preconditioner | `precompute_preconditioners()` → eigendecomposition | `cl_trainer_specroute.py` |
-| C4: Spectral entropy reg | `_compute_spectral_entropy_loss()` → QR trick | `cl_trainer_specroute.py` |
 | **C5: Data-informed init** | **`pre_task_data_collection()` → `eigh(Q@C@Q)` → set `lora_A.data`** | **`cl_trainer_specroute.py`** |
 ---
@@ -414,9 +447,12 @@ $A_t$ này đảm bảo capture **variance task-relevant tối đa** trong null-
 | LoRA | $r = 8$, target=Q+V, InfLoRA (chỉ B trained, A đóng băng) |
 | Routing | $\tau = 1.0$, $\alpha_{\mathrm{target}} = 0.8$, adaptive $\beta(n)$ (train); SVD đối xứng (inference) |
 | ESA | $\varepsilon_0 = 0.995$ (dynamic) |
-| C4 | $\lambda_{\text{entropy}} = 0.01$, preconditioning on, $\epsilon = 10^{-6}$, warmup = 10% |
-| **C5** | **N_batch_warmup = 200, dùng `torch.linalg.eigh` trên projected covariance** |
-| Precision | fp32 + gradient checkpointing |
 | So sánh | Batch size, LR, scheduler khớp chính xác ROOT (GainLoRA) |
 ---

 Với T5-Small ($d = 512$, $r = 8$, $\varepsilon = 0.02$): $T_{\max} \leq 65 \gg 15$ tasks. Điều này kết nối capacity học liên tục với lý thuyết Grassmannian packing.
+### 3.4 Cam kết Trực giao từ Kiến trúc InfLoRA
+> **Đây là phần đóng cửa lỗ hổng lý thuyết then chốt.** Reviewer thường lo ngại: "GPM gradient projection chỉ chiếu gradient, không đảm bảo các $\Delta W_t$ có subspace trực giao." Observation này *đúng* về GPM gradient projection nhưng *nhầm cơ chế* — tính trực giao đến từ bước khác: InfLoRA A-projection, cứng hơn nhiều.
+**Mệnh đề 2** *(InfLoRA đảm bảo Điều kiện Định lý 1).* Với $P_{\text{old}} = \mathcal{B}\mathcal{B}^T$ là GPM projection matrix (built từ tasks $1,\ldots,t-1$), bước InfLoRA chiếu **tất cả hàng của $A_t$ vào null-space của $P_{\text{old}}$**:
+$$A_t \leftarrow A_t(I - P_{\text{old}}) \quad\Rightarrow\quad \text{rowspace}(A_t) \subseteq \text{null}(P_{\text{old}})$$
+Khi đó:
+$$\text{span}(V_t) \;=\; \text{rowspace}(\Delta W_t) \;\subseteq\; \text{rowspace}(A_t) \;\subseteq\; \text{null}(P_{\text{old}})$$
+**(Chứng minh từng bước.)**
+- $\text{rowspace}(B_t A_t) \subseteq \text{rowspace}(A_t)$: đúng với mọi $B_t$ (phép nhân bên trái không mở rộng rowspace).
+- $\text{rowspace}(A_t) \subseteq \text{null}(P_{\text{old}})$: bởi bước InfLoRA projection ở trên.
+- GPM bases $\mathcal{B}$ span xấp xỉ $\text{rowspace}(A_s)$ cho các task $s < t$ (vì GPM tích lũy principal input directions, mà activation của task $s$ chủ yếu kích hoạt theo hướng $A_s$).
+- Do đó: $\text{span}(V_t) \subseteq \text{null}(P_{\text{old}}) \approx \perp \text{span}(V_s)$ với mọi $s < t$. $\square$
+**Chất lượng xấp xỉ:** Với GPM threshold $\varepsilon_0 = 0.995$ (capture ≥ 99.5% variance), $\delta_{t,s} \leq 1 - 0.995 = 0.005 \ll \kappa_{\min}(t^*)$ trong thực tế.
+**Sửa reviewer:** Reviewer nói "GPM không đảm bảo orthogonality của $\Delta W_t$" — *đúng* với GPM gradient projection. Nhưng cơ chế bảo đảm orthogonality là **InfLoRA A-projection** (bước khởi tạo), không phải gradient projection. Theorem 1 không cần giả định — nó là hệ quả tất yếu của kiến trúc InfLoRA đã có sẵn.
+---
+### 3.5 Drift Invariance
 **Mệnh đề 1** *(Drift-Free Routing).* Hàm routing $h \mapsto \alpha_t(h)$ hoàn toàn ổn định qua tất cả các task.
 **Chứng minh.** Routing input được tính từ frozen embedding table, *trước* bất kỳ transformer block nào. LoRA chỉ tồn tại trong các attention layer sâu hơn → $h$ độc lập với mọi tham số LoRA. Kết hợp với $\mathcal{S}_t$ đóng băng, $\alpha_t(h)$ bất biến với mọi thay đổi tích luỹ. $\square$
+### 3.6 Vấn đề then chốt: Null-Space Collapse
 Định lý 1 giả định $h \in \mathrm{span}(V_{t^*})$. Trong thực tế điều này đòi hỏi hai điều kiện:
 ```
 # Bước 1: Thu thập activation covariance (forward pass nhỏ, trước training)
+C_t = ∑ h(x)h(x)^T / N_batch    # covariance input task t (N_batch ~100 batches)
 # Bước 2: Project covariance vào null-space
 Q = I - P_old                   # null-space projector (từ GPM bases đã lưu)
 # Bước 3: Eigenvector decomposition
 eigvals, eigvecs = eigh(C_tilde) # đối xứng → eigh nhanh hơn SVD
+# Bước 4: Fallback nếu signal quá yếu (degenerate null-space)
+if eigvals[-1] < 1e-6:
+    # Null-space bị bão hoà hoặc task không có activation rõ ràng
+    # Revert về Kaiming random init + InfLoRA projection như gốc
+    continue
 top_r_idx = argsort(eigvals, descending=True)[:r]
+# Bước 5: Set A_t
 A_t = eigvecs[:, top_r_idx].T   # shape (r, d) — direction task-relevant nhất trong null-space
 A_t = A_t / norm(A_t, dim=1, keepdim=True) * sqrt(3)  # normalize như InfLoRA gốc
 ```
+**Điều kiện fallback:** Nếu `max_eigenvalue(C_tilde) < 1e-6`, null-space quá hẹp hoặc activation không có signal đủ mạnh. Trong trường hợp này, C5 nhường cho Kaiming init + InfLoRA projection tiêu chuẩn — không làm tệ hơn V6, chỉ không cải thiện. Điều kiện này chỉ xảy ra khi null-space gần như bão hoà, tức là ESA đã tiêu thụ gần hết capacity.
 #### Ý nghĩa Lý thuyết Thông tin
 Theo Data Processing Inequality, với bất kỳ ma trận $A_t$ nào:
    - Tính projected covariance: $\tilde{C}_t = Q C_t Q$ ($Q = I - P_{\text{old}}$)
    - Eigenvectors của $\tilde{C}_t$ → khởi tạo $A_t$ (thay thế random Kaiming)
 3. InfLoRA: chuẩn hoá $A_t$ (đã nằm trong null-space từ eigenvector decomposition).
+4. Huấn luyện `lora_B` với spectral affinity routing + adaptive bias $\beta(n)$ + gradient preconditioning (C4.1).
+5. Sau training: tính $\mathcal{S}_t$ (cả inference routing và storage) + cập nhật GPM bases (200 batches, đủ cho SVD ổn định).
 6. Lưu tất cả artifacts cho task tiếp theo.
 ---
 | GPM + InfLoRA null-space | `get_reg_matrix()` | `cl_trainer_specroute.py` |
 | Dynamic ESA threshold | `(1−ε₀)·t/T + ε₀` | `cl_trainer_specroute.py` |
 | C4: Preconditioner | `precompute_preconditioners()` → eigendecomposition | `cl_trainer_specroute.py` |
 | **C5: Data-informed init** | **`pre_task_data_collection()` → `eigh(Q@C@Q)` → set `lora_A.data`** | **`cl_trainer_specroute.py`** |
+| C5: Fallback | max eigval < 1e-6 → skip C5, keep Kaiming + InfLoRA projection | `cl_trainer_specroute.py` |
 ---
 | LoRA | $r = 8$, target=Q+V, InfLoRA (chỉ B trained, A đóng băng) |
 | Routing | $\tau = 1.0$, $\alpha_{\mathrm{target}} = 0.8$, adaptive $\beta(n)$ (train); SVD đối xứng (inference) |
 | ESA | $\varepsilon_0 = 0.995$ (dynamic) |
+| C4 | Gradient preconditioning bật (`--use_preconditioning True`), $\epsilon = 10^{-6}$; entropy reg đã loại bỏ V7 |
+| **C5** | **N_batch = 100, `torch.linalg.eigh` trên projected covariance, fallback nếu max_eigval < 1e-6** |
+| GPM repr. | 200 batches (giảm từ 1000 — SVD ổn định sau 200) |
+| Precision | fp32 + gradient_checkpointing (T5 + P100: fp16 có risk NaN overflow với large softmax) |
+| P100 BSZ | BSZ=8, GA=4 (effective 32); T4: BSZ=2, GA=8 |
+| Thời gian (P100 16GB) | SuperNI T5-Small ≈ 2-3h; Long benchmark ≈ 3-4h — thoải mái trong 12h Kaggle |
 | So sánh | Batch size, LR, scheduler khớp chính xác ROOT (GainLoRA) |
 ---

improve_gainlora/_patch_genscripts.py ADDED Viewed

	@@ -0,0 +1,62 @@

+"""Patch T5 specroute gen_scripts to add P100 GPU detection and BSZ."""
+import re, os
+BASE = '/Users/nnminh322/Desktop/personal/Continual/improve_gainlora'
+T5_SCRIPTS = [
+    os.path.join(BASE, 'gen_script_superni_order1_t5_specroute.sh'),
+    os.path.join(BASE, 'gen_script_superni_order2_t5_specroute.sh'),
+    os.path.join(BASE, 'gen_script_long_order3_t5_specroute.sh'),
+    os.path.join(BASE, 'gen_script_long_order4_t5_specroute.sh'),
+]
+GPU_OLD = (
+    'else\n'
+    '    GPU_MODE="a100"\n'
+    '    GPU_IDS="${1:-0}"\n'
+    '    FP16_FLAG=""\n'
+    '    echo "[GPU] Strategy: A100 (single GPU, fp32)"\n'
+    'fi'
+)
+GPU_NEW = (
+    'elif [ "$GPU_MEM" -gt 16000 ]; then\n'
+    '    GPU_MODE="p100"\n'
+    '    GPU_IDS="${1:-0}"\n'
+    '    FP16_FLAG="--gradient_checkpointing"\n'
+    '    echo "[GPU] Strategy: P100 16GB (fp32 + gradient_checkpointing)"\n'
+    'else\n'
+    '    GPU_MODE="a100"\n'
+    '    GPU_IDS="${1:-0}"\n'
+    '    FP16_FLAG=""\n'
+    '    echo "[GPU] Strategy: A100 (single GPU, fp32)"\n'
+    'fi'
+)
+BSZ_PAT = re.compile(
+    r'(elif \[ "\$GPU_MODE" = "t4_1gpu" \]; then\n    BSZ=\d+; GA=\d+; EVAL_BSZ=\d+\n)'
+    r'(else\n    BSZ=\d+; GA=\d+; EVAL_BSZ=\d+\n)'
+)
+def add_p100(m):
+    return (
+        m.group(1)
+        + 'elif [ "$GPU_MODE" = "p100" ]; then\n    BSZ=8; GA=4; EVAL_BSZ=4\n'
+        + m.group(2)
+    )
+for name in T5_SCRIPTS:
+    if not os.path.exists(name):
+        print(f'SKIP (not found): {name}')
+        continue
+    with open(name) as f:
+        c = f.read()
+    n_detect = c.count(GPU_OLD)
+    c = c.replace(GPU_OLD, GPU_NEW, 1)
+    n_bsz = len(BSZ_PAT.findall(c))
+    c = BSZ_PAT.sub(add_p100, c)
+    with open(name, 'w') as f:
+        f.write(c)
+    print(f'{name}: gpu_detect={n_detect} bsz_blocks={n_bsz}')
+print('Done.')

improve_gainlora/gen_script_long_order3_t5_specroute.sh CHANGED Viewed

@@ -23,7 +23,7 @@ if [ -z "$GPU_MEM" ]; then
 fi
 # Determine GPU type
-if [ "$GPU_MEM" -lt 20000 ]; then
     IS_T4=1
     echo "[GPU] Detected T4 GPUs (${GPU_MEM}MB VRAM each)"
 else
@@ -42,6 +42,11 @@ elif [ "$IS_T4" -eq 1 ]; then
     GPU_IDS="${1:-0}"
     FP16_FLAG="--gradient_checkpointing"
     echo "[GPU] Strategy: 1x T4 + fp32 + gradient_checkpointing"
 else
     GPU_MODE="a100"
     GPU_IDS="${1:-0}"
@@ -57,6 +62,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=8; GA=4; EVAL_BSZ=128
 fi
@@ -110,6 +117,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -163,6 +172,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -216,6 +227,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -269,6 +282,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -322,6 +337,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -375,6 +392,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -428,6 +447,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -481,6 +502,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -534,6 +557,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -587,6 +612,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -640,6 +667,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -693,6 +722,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -746,6 +777,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -799,6 +832,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi

 fi
 # Determine GPU type
+if [ "$GPU_MEM" -lt 15500 ]; then
     IS_T4=1
     echo "[GPU] Detected T4 GPUs (${GPU_MEM}MB VRAM each)"
 else
     GPU_IDS="${1:-0}"
     FP16_FLAG="--gradient_checkpointing"
     echo "[GPU] Strategy: 1x T4 + fp32 + gradient_checkpointing"
+elif [ "$GPU_MEM" -gt 16000 ]; then
+    GPU_MODE="p100"
+    GPU_IDS="${1:-0}"
+    FP16_FLAG="--gradient_checkpointing"
+    echo "[GPU] Strategy: P100 16GB (fp32 + gradient_checkpointing)"
 else
     GPU_MODE="a100"
     GPU_IDS="${1:-0}"
     BSZ=2; GA=8; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=8; GA=4; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi

improve_gainlora/gen_script_long_order4_t5_specroute.sh CHANGED Viewed

@@ -23,7 +23,7 @@ if [ -z "$GPU_MEM" ]; then
 fi
 # Determine GPU type
-if [ "$GPU_MEM" -lt 20000 ]; then
     IS_T4=1
     echo "[GPU] Detected T4 GPUs (${GPU_MEM}MB VRAM each)"
 else
@@ -42,6 +42,11 @@ elif [ "$IS_T4" -eq 1 ]; then
     GPU_IDS="${1:-0}"
     FP16_FLAG="--gradient_checkpointing"
     echo "[GPU] Strategy: 1x T4 + fp32 + gradient_checkpointing"
 else
     GPU_MODE="a100"
     GPU_IDS="${1:-0}"
@@ -57,6 +62,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=8; GA=4; EVAL_BSZ=128
 fi
@@ -110,6 +117,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -163,6 +172,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -216,6 +227,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -269,6 +282,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -322,6 +337,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -375,6 +392,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -428,6 +447,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -481,6 +502,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -534,6 +557,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -587,6 +612,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -640,6 +667,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -693,6 +722,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -746,6 +777,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
@@ -799,6 +832,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi

 fi
 # Determine GPU type
+if [ "$GPU_MEM" -lt 15500 ]; then
     IS_T4=1
     echo "[GPU] Detected T4 GPUs (${GPU_MEM}MB VRAM each)"
 else
     GPU_IDS="${1:-0}"
     FP16_FLAG="--gradient_checkpointing"
     echo "[GPU] Strategy: 1x T4 + fp32 + gradient_checkpointing"
+elif [ "$GPU_MEM" -gt 16000 ]; then
+    GPU_MODE="p100"
+    GPU_IDS="${1:-0}"
+    FP16_FLAG="--gradient_checkpointing"
+    echo "[GPU] Strategy: P100 16GB (fp32 + gradient_checkpointing)"
 else
     GPU_MODE="a100"
     GPU_IDS="${1:-0}"
     BSZ=2; GA=8; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=8; GA=4; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi
     BSZ=2; GA=4; EVAL_BSZ=16
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=16
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=128
 fi

improve_gainlora/gen_script_superni_order1_t5_specroute.sh CHANGED Viewed

@@ -23,7 +23,7 @@ if [ -z "$GPU_MEM" ]; then
 fi
 # Determine GPU type
-if [ "$GPU_MEM" -lt 20000 ]; then
     IS_T4=1
     echo "[GPU] Detected T4 GPUs (${GPU_MEM}MB VRAM each)"
 else
@@ -44,6 +44,11 @@ elif [ "$IS_T4" -eq 1 ]; then
     GPU_IDS="${1:-0}"
     FP16_FLAG="--gradient_checkpointing"
     echo "[GPU] Strategy: 1x T4 + fp32 + gradient_checkpointing"
 else
     GPU_MODE="a100"
     GPU_IDS="${1:-0}"
@@ -59,6 +64,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=2; GA=16; EVAL_BSZ=2
 else
     BSZ=16; GA=2; EVAL_BSZ=4
 fi
@@ -110,6 +117,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -161,6 +170,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -212,6 +223,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -263,6 +276,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -314,6 +329,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -365,6 +382,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -416,6 +435,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -467,6 +488,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -518,6 +541,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -569,6 +594,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -620,6 +647,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -671,6 +700,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -722,6 +753,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -773,6 +806,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi

 fi
 # Determine GPU type
+if [ "$GPU_MEM" -lt 15500 ]; then
     IS_T4=1
     echo "[GPU] Detected T4 GPUs (${GPU_MEM}MB VRAM each)"
 else
     GPU_IDS="${1:-0}"
     FP16_FLAG="--gradient_checkpointing"
     echo "[GPU] Strategy: 1x T4 + fp32 + gradient_checkpointing"
+elif [ "$GPU_MEM" -gt 16000 ]; then
+    GPU_MODE="p100"
+    GPU_IDS="${1:-0}"
+    FP16_FLAG="--gradient_checkpointing"
+    echo "[GPU] Strategy: P100 16GB (fp32 + gradient_checkpointing)"
 else
     GPU_MODE="a100"
     GPU_IDS="${1:-0}"
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=2; GA=16; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=2; GA=8; EVAL_BSZ=2
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=4; GA=8; EVAL_BSZ=2
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi

improve_gainlora/gen_script_superni_order2_t5_specroute.sh CHANGED Viewed

@@ -23,7 +23,7 @@ if [ -z "$GPU_MEM" ]; then
 fi
 # Determine GPU type
-if [ "$GPU_MEM" -lt 20000 ]; then
     IS_T4=1
     echo "[GPU] Detected T4 GPUs (${GPU_MEM}MB VRAM each)"
 else
@@ -42,6 +42,11 @@ elif [ "$IS_T4" -eq 1 ]; then
     GPU_IDS="${1:-0}"
     FP16_FLAG="--gradient_checkpointing"
     echo "[GPU] Strategy: 1x T4 + fp32 + gradient_checkpointing"
 else
     GPU_MODE="a100"
     GPU_IDS="${1:-0}"
@@ -57,6 +62,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=4
 fi
@@ -107,6 +114,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -157,6 +166,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -207,6 +218,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -257,6 +270,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -307,6 +322,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -357,6 +374,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -407,6 +426,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -457,6 +478,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -507,6 +530,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -557,6 +582,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -607,6 +634,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -657,6 +686,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -707,6 +738,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
@@ -757,6 +790,8 @@ if [ "$GPU_MODE" = "t4_2gpu" ]; then
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi

 fi
 # Determine GPU type
+if [ "$GPU_MEM" -lt 15500 ]; then
     IS_T4=1
     echo "[GPU] Detected T4 GPUs (${GPU_MEM}MB VRAM each)"
 else
     GPU_IDS="${1:-0}"
     FP16_FLAG="--gradient_checkpointing"
     echo "[GPU] Strategy: 1x T4 + fp32 + gradient_checkpointing"
+elif [ "$GPU_MEM" -gt 16000 ]; then
+    GPU_MODE="p100"
+    GPU_IDS="${1:-0}"
+    FP16_FLAG="--gradient_checkpointing"
+    echo "[GPU] Strategy: P100 16GB (fp32 + gradient_checkpointing)"
 else
     GPU_MODE="a100"
     GPU_IDS="${1:-0}"
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=16; GA=2; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi
     BSZ=4; GA=4; EVAL_BSZ=4
 elif [ "$GPU_MODE" = "t4_1gpu" ]; then
     BSZ=8; GA=4; EVAL_BSZ=4
+elif [ "$GPU_MODE" = "p100" ]; then
+    BSZ=8; GA=4; EVAL_BSZ=4
 else
     BSZ=32; GA=1; EVAL_BSZ=4
 fi

improve_gainlora/src/cl_trainer_specroute.py CHANGED Viewed

@@ -343,6 +343,10 @@ class SpecRoute_Trainer(Seq2SeqTrainer):
                         C_tilde = (C_tilde + C_tilde.T) * 0.5
                         # eigh returns ascending eigenvalues; take last r (largest)
                         eigvals, eigvecs = torch.linalg.eigh(C_tilde.float())
                         top_eigvecs = eigvecs[:, -r:].flip(dims=[1])  # [step, r]
                         A_init = top_eigvecs.T  # [r, step]
                         dtype  = module.lora_q.lora_A.data.dtype
@@ -408,7 +412,7 @@ class SpecRoute_Trainer(Seq2SeqTrainer):
                 else:
                     labels = None
                 outputs = self.model(**inputs)
-                if step > 1000:
                     break
         print('end get representation')

                         C_tilde = (C_tilde + C_tilde.T) * 0.5
                         # eigh returns ascending eigenvalues; take last r (largest)
                         eigvals, eigvecs = torch.linalg.eigh(C_tilde.float())
+                        # Fallback: if null-space signal is degenerate, keep Kaiming init
+                        if eigvals[-1].item() < 1e-6:
+                            print(f'[C5] Layer {i+1} index {index}: max_eigval={eigvals[-1].item():.2e} < 1e-6, fallback to Kaiming+InfLoRA')
+                            continue
                         top_eigvecs = eigvecs[:, -r:].flip(dims=[1])  # [step, r]
                         A_init = top_eigvecs.T  # [r, step]
                         dtype  = module.lora_q.lora_A.data.dtype
                 else:
                     labels = None
                 outputs = self.model(**inputs)
+                if step > 200:  # 200 batches sufficient for stable SVD (reduced from 1000 for speed)
                     break
         print('end get representation')

improve_gainlora/src/run_t5.py CHANGED Viewed

@@ -179,7 +179,7 @@ class ModelArguments:
         metadata={"help": "Weight for spectral entropy regularization (C4). 0 = disabled."},
     )
     use_preconditioning: Optional[bool] = field(
-        default=False,
         metadata={"help": "Enable (AA^T+eps*I)^{-1/2} gradient preconditioning on lora_B (C4)."},
     )
     precond_eps: Optional[float] = field(
@@ -955,7 +955,8 @@ def main():
             n_batches_c5=model_args.n_batches_c5,
         )
         if training_args.do_train:
-            trainer.pre_task_data_collection()
             trainer.get_reg_matrix()
             trainer.precompute_preconditioners()
     else:

         metadata={"help": "Weight for spectral entropy regularization (C4). 0 = disabled."},
     )
     use_preconditioning: Optional[bool] = field(
+        default=True,
         metadata={"help": "Enable (AA^T+eps*I)^{-1/2} gradient preconditioning on lora_B (C4)."},
     )
     precond_eps: Optional[float] = field(
             n_batches_c5=model_args.n_batches_c5,
         )
         if training_args.do_train:
+            if not model_args.run_single:  # C5 is only useful for tasks t>=2
+                trainer.pre_task_data_collection()
             trainer.get_reg_matrix()
             trainer.precompute_preconditioners()
     else: