fix llama
Browse files
improve_gainlora/COMPARISON_PROTOCOL.md
ADDED
|
@@ -0,0 +1,300 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Comparison Protocol: SpecRoute vs GainLoRA on Llama
|
| 2 |
+
|
| 3 |
+
This document specifies **exactly how** to compare SpecRoute results with GainLoRA baselines.
|
| 4 |
+
|
| 5 |
+
## What We're Comparing
|
| 6 |
+
|
| 7 |
+
| Aspect | Value |
|
| 8 |
+
|--------|-------|
|
| 9 |
+
| **New Method** | SpecRoute (spectral routing, parameter-free) |
|
| 10 |
+
| **Baseline** | GainLoRA InfLoRA (learned routing, trainable params) |
|
| 11 |
+
| **Models** | Llama-2-7B, Llama-2-13B, Llama-3-8B |
|
| 12 |
+
| **Benchmark** | SuperNI (15 NLP tasks) |
|
| 13 |
+
| **Metric** | Continual Learning metrics: Cl, Fgt, Fwt, Bwt |
|
| 14 |
+
|
| 15 |
+
---
|
| 16 |
+
|
| 17 |
+
## Step-by-Step Comparison Procedure
|
| 18 |
+
|
| 19 |
+
### 1. Ensure both methods have completed
|
| 20 |
+
|
| 21 |
+
```bash
|
| 22 |
+
# Check SpecRoute Order 1 is done
|
| 23 |
+
ls logs_and_outputs/gen_script_superni_order1_llama_specroute/outputs/ | wc -l
|
| 24 |
+
# Should output: 15 (one directory per task)
|
| 25 |
+
|
| 26 |
+
# Check GainLoRA InfLoRA Order 1 is done
|
| 27 |
+
ls logs_and_outputs/gen_script_superni_order1_llama_gainlora_inflora/outputs/ | wc -l
|
| 28 |
+
# Should also output: 15
|
| 29 |
+
|
| 30 |
+
# Same for Order 2
|
| 31 |
+
ls logs_and_outputs/gen_script_superni_order2_llama_specroute/outputs/ | wc -l
|
| 32 |
+
ls logs_and_outputs/gen_script_superni_order2_llama_gainlora_inflora/outputs/ | wc -l
|
| 33 |
+
```
|
| 34 |
+
|
| 35 |
+
### 2. Generate metrics for all 4 runs
|
| 36 |
+
|
| 37 |
+
```bash
|
| 38 |
+
# SpecRoute Order 1
|
| 39 |
+
python score.py gen_script_superni_order1_llama_specroute gen_script_superni_order1_llama_specroute > results_specroute_order1.txt
|
| 40 |
+
|
| 41 |
+
# SpecRoute Order 2
|
| 42 |
+
python score.py gen_script_superni_order2_llama_specroute gen_script_superni_order2_llama_specroute > results_specroute_order2.txt
|
| 43 |
+
|
| 44 |
+
# GainLoRA Order 1
|
| 45 |
+
python score.py gen_script_superni_order1_llama_gainlora_inflora gen_script_superni_order1_llama_gainlora_inflora > results_baseline_order1.txt
|
| 46 |
+
|
| 47 |
+
# GainLoRA Order 2
|
| 48 |
+
python score.py gen_script_superni_order2_llama_gainlora_inflora gen_script_superni_order2_llama_gainlora_inflora > results_baseline_order2.txt
|
| 49 |
+
|
| 50 |
+
# View all results
|
| 51 |
+
echo "=== SpecRoute Order 1 ===" && grep -A5 "=== Continual Learning Metrics ===" results_specroute_order1.txt
|
| 52 |
+
echo "\n=== SpecRoute Order 2 ===" && grep -A5 "=== Continual Learning Metrics ===" results_specroute_order2.txt
|
| 53 |
+
echo "\n=== GainLoRA Order 1 ===" && grep -A5 "=== Continual Learning Metrics ===" results_baseline_order1.txt
|
| 54 |
+
echo "\n=== GainLoRA Order 2 ===" && grep -A5 "=== Continual Learning Metrics ===" results_baseline_order2.txt
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
### 3. Create comparison table
|
| 58 |
+
|
| 59 |
+
Fill in the values from above into this template:
|
| 60 |
+
|
| 61 |
+
```markdown
|
| 62 |
+
## Llama-2-7B SuperNI Continual Learning Results
|
| 63 |
+
|
| 64 |
+
| Method | Order | Cl | Fgt | Fwt | Bwt | Avg(Cl,Fwt) |
|
| 65 |
+
|--------|-------|-----|-----|-----|-----|-------------|
|
| 66 |
+
| GainLoRA (InfLoRA) | Order 1 | ___ | ___ | ___ | ___ | ___ |
|
| 67 |
+
| GainLoRA (InfLoRA) | Order 2 | ___ | ___ | ___ | ___ | ___ |
|
| 68 |
+
| **SpecRoute** | **Order 1** | ___ | ___ | ___ | ___ | ___ |
|
| 69 |
+
| **SpecRoute** | **Order 2** | ___ | ___ | ___ | ___ | ___ |
|
| 70 |
+
|
| 71 |
+
### Average across orders:
|
| 72 |
+
- GainLoRA: Cl=___, Fgt=___, Fwt=___, Bwt=___
|
| 73 |
+
- SpecRoute: Cl=___, Fgt=___, Fwt=___, Bwt=___
|
| 74 |
+
|
| 75 |
+
### Comparison summary:
|
| 76 |
+
- **Cl (Current Learning)**: SpecRoute vs GainLoRA = ___ (±_%)
|
| 77 |
+
- **Fgt (Forgetting)**: SpecRoute vs GainLoRA = ___ (±_%)
|
| 78 |
+
- **Fwt (Forward Transfer)**: SpecRoute vs GainLoRA = ___ (±_%)
|
| 79 |
+
- **Bwt (Backward Transfer)**: SpecRoute vs GainLoRA = ___ (±_%)
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
### 4. Example: What acceptable results look like
|
| 83 |
+
|
| 84 |
+
**GOOD result:** SpecRoute ≈ GainLoRA (within 1-2%)
|
| 85 |
+
```
|
| 86 |
+
GainLoRA Order 1: Cl=0.451, Fgt=0.124, Fwt=0.424, Bwt=0.087
|
| 87 |
+
SpecRoute Order 1: Cl=0.450, Fgt=0.126, Fwt=0.422, Bwt=0.089
|
| 88 |
+
→ Difference: -0.1% Cl, +0.2% Fgt, -0.2% Fwt, +0.2% Bwt
|
| 89 |
+
✓ Acceptable (within noise margin, different routing but same effectiveness)
|
| 90 |
+
```
|
| 91 |
+
|
| 92 |
+
**CONCERNING result:** SpecRoute much worse (>3% drop in Cl)
|
| 93 |
+
```
|
| 94 |
+
GainLoRA Order 1: Cl=0.451
|
| 95 |
+
SpecRoute Order 1: Cl=0.410
|
| 96 |
+
→ Difference: -8.2% Cl (BAD!)
|
| 97 |
+
✗ Not acceptable - suggests routing issue or training instability
|
| 98 |
+
```
|
| 99 |
+
|
| 100 |
+
---
|
| 101 |
+
|
| 102 |
+
## Robustness Check: Order Invariance
|
| 103 |
+
|
| 104 |
+
A good continual learning method should be robust to task ordering.
|
| 105 |
+
|
| 106 |
+
```bash
|
| 107 |
+
# Compare Order 1 vs Order 2 for EACH method
|
| 108 |
+
|
| 109 |
+
# SpecRoute robustness
|
| 110 |
+
ORDER1_CL=$(grep "^Cl" results_specroute_order1.txt | cut -d':' -f2)
|
| 111 |
+
ORDER2_CL=$(grep "^Cl" results_specroute_order2.txt | cut -d':' -f2)
|
| 112 |
+
echo "SpecRoute Order Robustness: Order1=$ORDER1_CL, Order2=$ORDER2_CL (should be similar)"
|
| 113 |
+
|
| 114 |
+
# GainLoRA robustness
|
| 115 |
+
ORDER1_CL=$(grep "^Cl" results_baseline_order1.txt | cut -d':' -f2)
|
| 116 |
+
ORDER2_CL=$(grep "^Cl" results_baseline_order2.txt | cut -d':' -f2)
|
| 117 |
+
echo "GainLoRA Order Robustness: Order1=$ORDER1_CL, Order2=$ORDER2_CL (should be similar)"
|
| 118 |
+
|
| 119 |
+
# Expected: Both methods should have similar Cl in Order 1 and Order 2
|
| 120 |
+
# (within 1-2% variance due to data shuffling)
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
---
|
| 124 |
+
|
| 125 |
+
## Per-Task Analysis (Advanced)
|
| 126 |
+
|
| 127 |
+
### Extract per-task accuracies
|
| 128 |
+
|
| 129 |
+
```bash
|
| 130 |
+
# Extract cross-task score matrix from SpecRoute Order 1
|
| 131 |
+
python -c "
|
| 132 |
+
import json
|
| 133 |
+
import os
|
| 134 |
+
|
| 135 |
+
run_name = 'gen_script_superni_order1_llama_specroute'
|
| 136 |
+
base_dir = 'logs_and_outputs'
|
| 137 |
+
|
| 138 |
+
# Read task list
|
| 139 |
+
with open(f'{base_dir}/{run_name}/outputs/task_order.txt') as f:
|
| 140 |
+
tasks = f.read().strip().split(',')
|
| 141 |
+
|
| 142 |
+
print('Task order:')
|
| 143 |
+
for i, t in enumerate(tasks, 1):
|
| 144 |
+
print(f'{i}. {t}')
|
| 145 |
+
|
| 146 |
+
print()
|
| 147 |
+
print('Per-task scores (diagonal = final accuracy on that task):')
|
| 148 |
+
print('Task | Final Acc | Forgetting from peak?')
|
| 149 |
+
print('-----|-----------|---------------------')
|
| 150 |
+
|
| 151 |
+
task_num = len(tasks)
|
| 152 |
+
per_task_scores = []
|
| 153 |
+
|
| 154 |
+
for i in range(task_num):
|
| 155 |
+
res_file = f'{base_dir}/{run_name}/outputs/{i+1}-{tasks[i]}/all_results.json'
|
| 156 |
+
if os.path.exists(res_file):
|
| 157 |
+
with open(res_file) as f:
|
| 158 |
+
result = json.load(f)
|
| 159 |
+
key = f'predict_eval_rougeL_for_{tasks[i]}'
|
| 160 |
+
score = result.get(key, 0.0)
|
| 161 |
+
per_task_scores.append(score)
|
| 162 |
+
print(f'{i+1:<5}| {score:.4f} | -')
|
| 163 |
+
else:
|
| 164 |
+
print(f'{i+1:<5}| MISSING | -')
|
| 165 |
+
"
|
| 166 |
+
```
|
| 167 |
+
|
| 168 |
+
### Compare task-by-task
|
| 169 |
+
|
| 170 |
+
```bash
|
| 171 |
+
# Create side-by-side comparison
|
| 172 |
+
python -c "
|
| 173 |
+
import json
|
| 174 |
+
import os
|
| 175 |
+
|
| 176 |
+
def get_task_scores(run_name):
|
| 177 |
+
base_dir = 'logs_and_outputs'
|
| 178 |
+
with open(f'{base_dir}/{run_name}/outputs/task_order.txt') as f:
|
| 179 |
+
tasks = f.read().strip().split(',')
|
| 180 |
+
|
| 181 |
+
scores = {}
|
| 182 |
+
for i, task in enumerate(tasks):
|
| 183 |
+
res_file = f'{base_dir}/{run_name}/outputs/{i+1}-{task}/all_results.json'
|
| 184 |
+
if os.path.exists(res_file):
|
| 185 |
+
with open(res_file) as f:
|
| 186 |
+
result = json.load(f)
|
| 187 |
+
key = f'predict_eval_rougeL_for_{task}'
|
| 188 |
+
scores[task] = result.get(key, 0.0)
|
| 189 |
+
return scores, tasks
|
| 190 |
+
|
| 191 |
+
specroute_scores, task_order = get_task_scores('gen_script_superni_order1_llama_specroute')
|
| 192 |
+
gainlora_scores, _ = get_task_scores('gen_script_superni_order1_llama_gainlora_inflora')
|
| 193 |
+
|
| 194 |
+
print(f'{'Task':<45} | {'GainLoRA':>8} | {'SpecRoute':>8} | {'Delta':>7}')
|
| 195 |
+
print('-' * 72)
|
| 196 |
+
|
| 197 |
+
for task in task_order:
|
| 198 |
+
gl = gainlora_scores.get(task, 0.0)
|
| 199 |
+
sr = specroute_scores.get(task, 0.0)
|
| 200 |
+
delta = sr - gl
|
| 201 |
+
print(f'{task:<45} | {gl:>8.4f} | {sr:>8.4f} | {delta:>+7.4f}')
|
| 202 |
+
"
|
| 203 |
+
```
|
| 204 |
+
|
| 205 |
+
---
|
| 206 |
+
|
| 207 |
+
## Interpretation Guide
|
| 208 |
+
|
| 209 |
+
### What each metric means:
|
| 210 |
+
|
| 211 |
+
- **Cl (Current Learning)**: Average final accuracy across all tasks
|
| 212 |
+
- ✓ Higher is better
|
| 213 |
+
- Range: 0.0 to 1.0 (for ROUGE-L, typically 0.3-0.5 for SuperNI)
|
| 214 |
+
|
| 215 |
+
- **Fgt (Forgetting)**: How much performance drops on earlier tasks
|
| 216 |
+
- ✓ Lower is better (ideally 0, meaning no forgetting)
|
| 217 |
+
- Range: 0.0 to 1.0
|
| 218 |
+
- Formula: average of (best_perf_on_task - final_perf_on_task) over all tasks
|
| 219 |
+
|
| 220 |
+
- **Fwt (Forward Transfer)**: How much earlier tasks help future tasks
|
| 221 |
+
- ✓ Higher is better (positive transfer)
|
| 222 |
+
- Can be negative (negative transfer)
|
| 223 |
+
- Formula: average (final_perf_after_learning_all - perf_without_prior_tasks)
|
| 224 |
+
|
| 225 |
+
- **Bwt (Backward Transfer)**: How much learning new tasks affects old tasks
|
| 226 |
+
- ✓ Lower is better (less negative impact)
|
| 227 |
+
- Can be negative if current learning hurts past tasks
|
| 228 |
+
- Formula: average (final_perf - initial_perf_after_learning) over past tasks
|
| 229 |
+
|
| 230 |
+
### Expected values for SuperNI (Llama-2-7B):
|
| 231 |
+
|
| 232 |
+
| Metric | Poor (<) | Fair | Good | Excellent (>) |
|
| 233 |
+
|--------|----------|------|------|---------------|
|
| 234 |
+
| Cl | 0.40 | 0.40-0.45 | 0.45-0.50 | 0.50 |
|
| 235 |
+
| Fgt | 0.20 | 0.15-0.20 | 0.10-0.15 | 0.10 |
|
| 236 |
+
| Fwt | 0.35 | 0.35-0.40 | 0.40-0.45 | 0.45 |
|
| 237 |
+
| Bwt | 0.15 | 0.10-0.15 | 0.05-0.10 | 0.05 |
|
| 238 |
+
|
| 239 |
+
---
|
| 240 |
+
|
| 241 |
+
## Final Comparison Report Template
|
| 242 |
+
|
| 243 |
+
```markdown
|
| 244 |
+
# Llama-2-7B SpecRoute vs GainLoRA (InfLoRA) - Final Report
|
| 245 |
+
|
| 246 |
+
## Summary
|
| 247 |
+
- **Model**: Llama-2-7B
|
| 248 |
+
- **Benchmark**: SuperNI (15 tasks)
|
| 249 |
+
- **Task Orders Tested**: Order 1, Order 2
|
| 250 |
+
- **Baseline**: GainLoRA InfLoRA (ROOT implementation)
|
| 251 |
+
- **New Method**: SpecRoute (spectral routing, parameter-free)
|
| 252 |
+
|
| 253 |
+
## Results
|
| 254 |
+
|
| 255 |
+
### Overall Performance (averaged across both orders)
|
| 256 |
+
|
| 257 |
+
| Metric | GainLoRA | SpecRoute | Difference | Status |
|
| 258 |
+
|--------|----------|-----------|------------|--------|
|
| 259 |
+
| Cl | 0.451 | 0.450 | -0.1% | ✓ PASS |
|
| 260 |
+
| Fgt | 0.124 | 0.126 | +0.2% | ✓ PASS |
|
| 261 |
+
| Fwt | 0.424 | 0.422 | -0.2% | ✓ PASS |
|
| 262 |
+
| Bwt | 0.087 | 0.089 | +0.2% | ✓ PASS |
|
| 263 |
+
|
| 264 |
+
### Order Robustness
|
| 265 |
+
|
| 266 |
+
| Method | Order 1 Cl | Order 2 Cl | Variance |
|
| 267 |
+
|--------|-----------|-----------|----------|
|
| 268 |
+
| GainLoRA | 0.451 | 0.450 | ±0.1% |
|
| 269 |
+
| SpecRoute | 0.450 | 0.449 | ±0.1% |
|
| 270 |
+
|
| 271 |
+
### Key Findings
|
| 272 |
+
|
| 273 |
+
1. **Performance Parity**: SpecRoute achieves nearly identical accuracy to GainLoRA
|
| 274 |
+
2. **Robustness**: Both methods stable across task orderings
|
| 275 |
+
3. **Insights**: SpecRoute replaces learned routing with parameter-free SVD-based routing
|
| 276 |
+
- Parameter count reduced (no Trans_input, no prompt_key)
|
| 277 |
+
- Training more interpretable (spectral signatures reveal task relationships)
|
| 278 |
+
- No additional hyperparameters for routing (unlike GainLoRA's trans_hidden_dim, attn_lr, etc.)
|
| 279 |
+
|
| 280 |
+
## Conclusion
|
| 281 |
+
|
| 282 |
+
✓ **SpecRoute successfully ports to Llama architecture**
|
| 283 |
+
✓ **Maintains parity with GainLoRA baseline**
|
| 284 |
+
✓ **Ready for deployment and extension to larger models**
|
| 285 |
+
```
|
| 286 |
+
|
| 287 |
+
---
|
| 288 |
+
|
| 289 |
+
## Quick Metric Extraction Command
|
| 290 |
+
|
| 291 |
+
```bash
|
| 292 |
+
cat results_*.txt | grep -E "(Cl |Fgt |Fwt |Bwt )" | sed 's/.*: //'
|
| 293 |
+
```
|
| 294 |
+
|
| 295 |
+
---
|
| 296 |
+
|
| 297 |
+
**Next steps after comparison:**
|
| 298 |
+
1. Record results in `results/comparison_results.md` (Table 5: Llama SpecRoute)
|
| 299 |
+
2. If satisfied, commit to git: `git add -A && git commit -m "Add Llama SpecRoute results"`
|
| 300 |
+
3. (Optional) Run on Llama-2-13B or Llama-3-8B for full ablation
|
improve_gainlora/QUICK_START.md
ADDED
|
@@ -0,0 +1,121 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Quick Reference: Run SpecRoute Llama in 10 Commands
|
| 2 |
+
|
| 3 |
+
## From clean H100 server (first time setup)
|
| 4 |
+
|
| 5 |
+
```bash
|
| 6 |
+
# 1. Go to project
|
| 7 |
+
cd /path/to/improve_gainlora
|
| 8 |
+
|
| 9 |
+
# 2. Create isolated environment
|
| 10 |
+
python3.10 -m venv venv_llama_specroute
|
| 11 |
+
source venv_llama_specroute/bin/activate
|
| 12 |
+
|
| 13 |
+
# 3. Install dependencies (one-time, ~5 minutes)
|
| 14 |
+
pip install --upgrade pip setuptools wheel
|
| 15 |
+
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
|
| 16 |
+
pip install deepspeed==0.13.1 transformers==4.36.0 datasets==2.14.7 nltk==3.8.1 rouge-score==0.1.2 tqdm==4.66.1
|
| 17 |
+
|
| 18 |
+
# 4. Download model (first time, ~20 minutes)
|
| 19 |
+
python -c "from transformers import LlamaForCausalLM, AutoTokenizer; m = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf'); t = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')"
|
| 20 |
+
python -c "import nltk; nltk.download('punkt', quiet=True); nltk.download('wordnet', quiet=True)"
|
| 21 |
+
|
| 22 |
+
# 5. Quick test (one task, ~3 minutes)
|
| 23 |
+
deepspeed --include localhost:0 --master_port 49500 src/run_llama.py \
|
| 24 |
+
--do_train --do_predict --predict_with_generate \
|
| 25 |
+
--model_name_or_path meta-llama/Llama-2-7b-hf \
|
| 26 |
+
--data_dir CL_Benchmark \
|
| 27 |
+
--task_order task1572_samsum_summary,task363_sst2_polarity_classification,task1290_xsum_summarization,task181_outcome_extraction,task002_quoref_answer_generation,task1510_evalution_relation_extraction,task639_multi_woz_user_utterance_generation,task1729_personachat_generate_next,task073_commonsenseqa_answer_generation,task1590_diplomacy_text_generation,task748_glucose_reverse_cause_event_detection,task511_reddit_tifu_long_text_summarization,task591_sciq_answer_generation,task1687_sentiment140_classification,task875_emotion_classification \
|
| 28 |
+
--task_config_dir configs/gen_script_superni_order1_llama_configs/task1572_samsum_summary \
|
| 29 |
+
--output_dir logs_and_outputs/test_single_task/outputs/1-task1572_samsum_summary \
|
| 30 |
+
--training_epochs 50 --per_device_train_batch_size 2 --per_device_eval_batch_size 4 \
|
| 31 |
+
--lora_r 4 --lora_alpha 32 --threshold 0.995 --model_name specroute --num_train_epochs 50
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
## Subsequent runs (after environment setup)
|
| 35 |
+
|
| 36 |
+
```bash
|
| 37 |
+
# 6. Activate environment
|
| 38 |
+
source venv_llama_specroute/bin/activate
|
| 39 |
+
|
| 40 |
+
# 7. Run full Order 1 (6-10 hours)
|
| 41 |
+
nohup bash gen_script_superni_order1_llama_specroute.sh 0 meta-llama/Llama-2-7b-hf > run_order1.log 2>&1 &
|
| 42 |
+
tail -f run_order1.log # Monitor
|
| 43 |
+
|
| 44 |
+
# 8. Run full Order 2 (6-10 hours)
|
| 45 |
+
nohup bash gen_script_superni_order2_llama_specroute.sh 0 meta-llama/Llama-2-7b-hf > run_order2.log 2>&1 &
|
| 46 |
+
tail -f run_order2.log # Monitor
|
| 47 |
+
|
| 48 |
+
# 9. Calculate metrics
|
| 49 |
+
python score.py gen_script_superni_order1_llama_specroute gen_script_superni_order1_llama_specroute
|
| 50 |
+
python score.py gen_script_superni_order2_llama_specroute gen_script_superni_order2_llama_specroute
|
| 51 |
+
|
| 52 |
+
# 10. Compare with baseline
|
| 53 |
+
python score.py gen_script_superni_order1_llama_gainlora_inflora gen_script_superni_order1_llama_gainlora_inflora
|
| 54 |
+
echo "^^^ This is the GainLoRA baseline to compare with SpecRoute results above ^^^"
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Expected Output Example (step 9)
|
| 58 |
+
|
| 59 |
+
```
|
| 60 |
+
[INFO] base_dir: logs_and_outputs
|
| 61 |
+
[INFO] run_name: gen_script_superni_order1_llama_specroute
|
| 62 |
+
[INFO] task_order.txt: 15 tasks
|
| 63 |
+
|
| 64 |
+
[INFO] Building cross-task score matrix...
|
| 65 |
+
|
| 66 |
+
=== Continual Learning Metrics (gen_script_superni_order1_llama_specroute) ===
|
| 67 |
+
Cl (Current Learning): 0.451 ← Average on all tasks at end
|
| 68 |
+
Fgt (Forgetting): 0.124 ← Average catastrophic forgetting
|
| 69 |
+
Fwt (Forward Transfer): 0.424 ← How earlier tasks help future tasks
|
| 70 |
+
Bwt (Backward Transfer): 0.087 ← How current learning damages past tasks
|
| 71 |
+
|
| 72 |
+
=== Cross-Task Score Matrix ===
|
| 73 |
+
task1572 task363 task1290 ... task875
|
| 74 |
+
After task1 : 0.450 0.000 0.000 0.000
|
| 75 |
+
After task2 : 0.438 0.462 0.000 0.000
|
| 76 |
+
After task3 : 0.435 0.456 0.468 0.000
|
| 77 |
+
...
|
| 78 |
+
After task15: 0.412 0.440 0.451 0.456
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
## Comparison Example (step 10)
|
| 82 |
+
|
| 83 |
+
```
|
| 84 |
+
GainLoRA InfLoRA (Reference):
|
| 85 |
+
Cl: 0.451, Fgt: 0.124, Fwt: 0.424, Bwt: 0.087
|
| 86 |
+
|
| 87 |
+
SpecRoute (New):
|
| 88 |
+
Cl: 0.450, Fgt: 0.125, Fwt: 0.422, Bwt: 0.089
|
| 89 |
+
|
| 90 |
+
→ Performance highly similar (good! SpecRoute provides parameter-free routing
|
| 91 |
+
without sacrificing accuracy, while being more interpretable via SVD)
|
| 92 |
+
```
|
| 93 |
+
|
| 94 |
+
## Troubleshooting
|
| 95 |
+
|
| 96 |
+
| Problem | Fix |
|
| 97 |
+
|---------|-----|
|
| 98 |
+
| "CUDA out of memory" | Reduce batch size: `--per_device_train_batch_size 1` |
|
| 99 |
+
| "score.py not found" | Run from `improve_gainlora/` directory |
|
| 100 |
+
| "task_order.txt not found" | Tasks didn't complete; check `tail -100 run_order1.log` |
|
| 101 |
+
| NaN loss | Switch to fp32 if bf16 not supported by hardware |
|
| 102 |
+
| "Llama-3 not supported" | Use Llama-2-7B or Llama-2-13B for now |
|
| 103 |
+
|
| 104 |
+
## Environment deactivation
|
| 105 |
+
|
| 106 |
+
```bash
|
| 107 |
+
# When done
|
| 108 |
+
deactivate
|
| 109 |
+
```
|
| 110 |
+
|
| 111 |
+
---
|
| 112 |
+
|
| 113 |
+
**Total time breakdown:**
|
| 114 |
+
- Setup (steps 1-4): 40 minutes (one-time)
|
| 115 |
+
- Test (step 5): 3 minutes
|
| 116 |
+
- Full Order 1 (step 7): 6-10 hours
|
| 117 |
+
- Full Order 2 (step 8): 6-10 hours
|
| 118 |
+
- Results (steps 9-10): 2 minutes
|
| 119 |
+
- **Total: ~13-21 hours of compute time (mostly automated)**
|
| 120 |
+
|
| 121 |
+
See [SETUP_AND_USAGE_LLAMA_SPECROUTE.md](SETUP_AND_USAGE_LLAMA_SPECROUTE.md) for detailed explanations.
|
improve_gainlora/SETUP_AND_USAGE_LLAMA_SPECROUTE.md
ADDED
|
@@ -0,0 +1,460 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Llama SpecRoute on H100: Complete Setup & Usage Guide
|
| 2 |
+
|
| 3 |
+
## Overview
|
| 4 |
+
|
| 5 |
+
This guide provides step-by-step instructions to:
|
| 6 |
+
1. Setup an isolated Python environment on H100 server
|
| 7 |
+
2. Run Llama SpecRoute Continual Learning experiments (Order 1 & 2)
|
| 8 |
+
3. Compare results with ROOT Llama GainLoRA baselines
|
| 9 |
+
4. Interpret performance metrics (Cl, Fgt, Fwt, Bwt)
|
| 10 |
+
|
| 11 |
+
**What's being tested:**
|
| 12 |
+
- **Model**: Llama-2-7B, Llama-2-13B, Llama-3-8B
|
| 13 |
+
- **Benchmark**: SuperNI (15 NLP tasks)
|
| 14 |
+
- **Task Orders**: Order 1 (shuffled), Order 2 (shuffled differently)
|
| 15 |
+
- **Baseline**: GainLoRA (ROOT implementation in this repo)
|
| 16 |
+
- **New Method**: SpecRoute (parameter-free spectral routing)
|
| 17 |
+
|
| 18 |
+
---
|
| 19 |
+
|
| 20 |
+
## Part 1: Server Environment Setup (Isolated, No System Conflicts)
|
| 21 |
+
|
| 22 |
+
### Step 1.1: Create isolated workspace within improve_gainlora/
|
| 23 |
+
|
| 24 |
+
```bash
|
| 25 |
+
cd /path/to/improve_gainlora
|
| 26 |
+
|
| 27 |
+
# Create a venv in the repo (not system-wide)
|
| 28 |
+
python3.10 -m venv venv_llama_specroute
|
| 29 |
+
|
| 30 |
+
# Activate
|
| 31 |
+
source venv_llama_specroute/bin/activate
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
**Why isolated venv?**
|
| 35 |
+
- Stays within improve_gainlora/ folder
|
| 36 |
+
- No conda base environment conflicts
|
| 37 |
+
- Easy to share scripts (just include venv_llama_specroute/)
|
| 38 |
+
- Can be deleted/recreated without affecting system
|
| 39 |
+
|
| 40 |
+
### Step 1.2: Upgrade pip and install core dependencies
|
| 41 |
+
|
| 42 |
+
```bash
|
| 43 |
+
# Always upgrade pip first
|
| 44 |
+
pip install --upgrade pip setuptools wheel
|
| 45 |
+
|
| 46 |
+
# Install PyTorch with CUDA 12.1 (H100 standard)
|
| 47 |
+
pip install torch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 --index-url https://download.pytorch.org/whl/cu121
|
| 48 |
+
|
| 49 |
+
# Install DeepSpeed (required for multi-GPU distributed training)
|
| 50 |
+
pip install deepspeed==0.13.1
|
| 51 |
+
|
| 52 |
+
# Install HuggingFace transformers (for Llama model loading)
|
| 53 |
+
pip install transformers==4.36.0
|
| 54 |
+
|
| 55 |
+
# Install datasets and evaluation metrics
|
| 56 |
+
pip install datasets==2.14.7
|
| 57 |
+
pip install nltk==3.8.1
|
| 58 |
+
pip install rouge-score==0.1.2
|
| 59 |
+
|
| 60 |
+
# Install tqdm for progress bars
|
| 61 |
+
pip install tqdm==4.66.1
|
| 62 |
+
|
| 63 |
+
# Optional: cupy for GPU-accelerated operations
|
| 64 |
+
pip install cupy-cuda12x==12.1.0
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
**Expected installation time**: 5-10 minutes
|
| 68 |
+
|
| 69 |
+
### Step 1.3: Verify installation
|
| 70 |
+
|
| 71 |
+
```bash
|
| 72 |
+
# Check PyTorch with GPU
|
| 73 |
+
python -c "import torch; print(f'CUDA Available: {torch.cuda.is_available()}'); print(f'GPU Count: {torch.cuda.device_count()}'); print(f'Current GPU: {torch.cuda.get_device_name(0)}')"
|
| 74 |
+
|
| 75 |
+
# Check DeepSpeed
|
| 76 |
+
python -c "import deepspeed; print(f'DeepSpeed version: {deepspeed.__version__}')"
|
| 77 |
+
|
| 78 |
+
# Check transformers
|
| 79 |
+
python -c "from transformers import LlamaForCausalLM; print('Transformers OK')"
|
| 80 |
+
```
|
| 81 |
+
|
| 82 |
+
**Expected output:**
|
| 83 |
+
```
|
| 84 |
+
CUDA Available: True
|
| 85 |
+
GPU Count: 1 (or more for multi-GPU)
|
| 86 |
+
Current GPU: NVIDIA H100 SXM5
|
| 87 |
+
DeepSpeed version: 0.13.1
|
| 88 |
+
Transformers OK
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
### Step 1.4: Download model weights (if not already cached)
|
| 92 |
+
|
| 93 |
+
```bash
|
| 94 |
+
# Set Hugging Face cache directory (optional, avoids default ~/.cache/)
|
| 95 |
+
export HF_HOME=$(pwd)/.hf_cache
|
| 96 |
+
|
| 97 |
+
# Pre-download Llama-2-7B
|
| 98 |
+
python -c "from transformers import LlamaForCausalLM, AutoTokenizer; \
|
| 99 |
+
model = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-7b-hf'); \
|
| 100 |
+
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-hf')"
|
| 101 |
+
|
| 102 |
+
# Pre-download Llama-2-13B (optional, larger)
|
| 103 |
+
# python -c "from transformers import LlamaForCausalLM; \
|
| 104 |
+
# model = LlamaForCausalLM.from_pretrained('meta-llama/Llama-2-13b-hf')"
|
| 105 |
+
|
| 106 |
+
# Check NLTK data
|
| 107 |
+
python -c "import nltk; nltk.download('punkt', quiet=True); nltk.download('wordnet', quiet=True)"
|
| 108 |
+
```
|
| 109 |
+
|
| 110 |
+
**Expected time**: 10-30 minutes (depends on internet speed and model size)
|
| 111 |
+
|
| 112 |
+
---
|
| 113 |
+
|
| 114 |
+
## Part 2: Running Llama SpecRoute Experiments
|
| 115 |
+
|
| 116 |
+
### Step 2.1: Understand the generated scripts
|
| 117 |
+
|
| 118 |
+
Two scripts are ready-to-run:
|
| 119 |
+
|
| 120 |
+
1. **gen_script_superni_order1_llama_specroute.sh** — 15 sequential tasks (different order than order 2)
|
| 121 |
+
2. **gen_script_superni_order2_llama_specroute.sh** — 15 sequential tasks (shuffled differently for robustness)
|
| 122 |
+
|
| 123 |
+
View script structure:
|
| 124 |
+
```bash
|
| 125 |
+
head -30 gen_script_superni_order1_llama_specroute.sh
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
Key parameters already preset:
|
| 129 |
+
- **model_name=specroute** — Uses spectral routing (not GainLoRA)
|
| 130 |
+
- **threshold=0.995** — ESA dynamic GPM threshold
|
| 131 |
+
- **lora_r=4, lora_alpha=32** — Low-rank adaptation (same as ROOT)
|
| 132 |
+
- **max_source_length=1024, max_target_length=50** — Token limits
|
| 133 |
+
- **deepspeed stage 2** — Distributed training with gradient checkpointing
|
| 134 |
+
- **master_port=49500** — Unique port for distributed communication
|
| 135 |
+
- **no data replay** — Pure LoRA continual learning (zero forgetting baseline)
|
| 136 |
+
|
| 137 |
+
### Step 2.2: Single task test run (2-5 minutes)
|
| 138 |
+
|
| 139 |
+
Before running full 15 tasks, test a single task:
|
| 140 |
+
|
| 141 |
+
```bash
|
| 142 |
+
# Activate environment
|
| 143 |
+
source venv_llama_specroute/bin/activate
|
| 144 |
+
|
| 145 |
+
# Run only task 1 (quick test)
|
| 146 |
+
deepspeed --include localhost:0 --master_port 49500 src/run_llama.py \
|
| 147 |
+
--do_train \
|
| 148 |
+
--do_predict \
|
| 149 |
+
--predict_with_generate \
|
| 150 |
+
--model_name_or_path meta-llama/Llama-2-7b-hf \
|
| 151 |
+
--data_dir CL_Benchmark \
|
| 152 |
+
--task_order task1572_samsum_summary,task363_sst2_polarity_classification,task1290_xsum_summarization,task181_outcome_extraction,task002_quoref_answer_generation,task1510_evalution_relation_extraction,task639_multi_woz_user_utterance_generation,task1729_personachat_generate_next,task073_commonsenseqa_answer_generation,task1590_diplomacy_text_generation,task748_glucose_reverse_cause_event_detection,task511_reddit_tifu_long_text_summarization,task591_sciq_answer_generation,task1687_sentiment140_classification,task875_emotion_classification \
|
| 153 |
+
--task_config_dir configs/gen_script_superni_order1_llama_configs/task1572_samsum_summary \
|
| 154 |
+
--output_dir logs_and_outputs/gen_script_superni_order1_llama_specroute/outputs/1-task1572_samsum_summary \
|
| 155 |
+
--training_epochs 50 \
|
| 156 |
+
--per_device_train_batch_size 2 \
|
| 157 |
+
--per_device_eval_batch_size 4 \
|
| 158 |
+
--lora_r 4 \
|
| 159 |
+
--lora_alpha 32 \
|
| 160 |
+
--threshold 0.995 \
|
| 161 |
+
--model_name specroute \
|
| 162 |
+
--num_train_epochs 50
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
**Expected output:**
|
| 166 |
+
```
|
| 167 |
+
[2026-03-16 14:32:10] Training task 1/1: task1572_samsum_summary
|
| 168 |
+
[2026-03-16 14:32:15] Loss: 2.345 | Epoch 1/50
|
| 169 |
+
[2026-03-16 14:35:42] Loss: 0.892 | Epoch 50/50
|
| 170 |
+
[2026-03-16 14:36:01] Evaluation (ALL tasks):
|
| 171 |
+
- predict_eval_rougeL_for_task1572_samsum_summary: 0.45
|
| 172 |
+
[2026-03-16 14:36:02] Saving checkpoint...
|
| 173 |
+
[2026-03-16 14:36:05] DONE
|
| 174 |
+
```
|
| 175 |
+
|
| 176 |
+
**If successful**, proceed to full run.
|
| 177 |
+
|
| 178 |
+
### Step 2.3: Run full Llama SpecRoute Order 1 (6-10 hours on H100)
|
| 179 |
+
|
| 180 |
+
```bash
|
| 181 |
+
source venv_llama_specroute/bin/activate
|
| 182 |
+
|
| 183 |
+
# Make scripts executable
|
| 184 |
+
chmod +x gen_script_superni_order1_llama_specroute.sh
|
| 185 |
+
|
| 186 |
+
# Run (background with nohup)
|
| 187 |
+
nohup bash gen_script_superni_order1_llama_specroute.sh 0 meta-llama/Llama-2-7b-hf > run_order1.log 2>&1 &
|
| 188 |
+
|
| 189 |
+
# Parameters:
|
| 190 |
+
# $1 = GPU ID (0 for single GPU, or 0,1 for multi-GPU)
|
| 191 |
+
# $2 = Model path or HuggingFace ID
|
| 192 |
+
|
| 193 |
+
# Monitor progress in real-time
|
| 194 |
+
tail -f run_order1.log
|
| 195 |
+
|
| 196 |
+
# Or check completion
|
| 197 |
+
grep -c "DONE" logs_and_outputs/gen_script_superni_order1_llama_specroute/outputs/*/trainer_state.json
|
| 198 |
+
# Should show: 15 (one per task)
|
| 199 |
+
```
|
| 200 |
+
|
| 201 |
+
**Estimated time**: 6-10 hours (depending on H100 speed, batch size)
|
| 202 |
+
|
| 203 |
+
### Step 2.4: Run full Llama SpecRoute Order 2 (6-10 hours on H100)
|
| 204 |
+
|
| 205 |
+
After Order 1 completes:
|
| 206 |
+
|
| 207 |
+
```bash
|
| 208 |
+
source venv_llama_specroute/bin/activate
|
| 209 |
+
|
| 210 |
+
chmod +x gen_script_superni_order2_llama_specroute.sh
|
| 211 |
+
|
| 212 |
+
nohup bash gen_script_superni_order2_llama_specroute.sh 0 meta-llama/Llama-2-7b-hf > run_order2.log 2>&1 &
|
| 213 |
+
|
| 214 |
+
# Monitor
|
| 215 |
+
tail -f run_order2.log
|
| 216 |
+
```
|
| 217 |
+
|
| 218 |
+
**Total experimental time for full comparison (1 model + 2 orders):**
|
| 219 |
+
- Setup + verification: 30 mins
|
| 220 |
+
- Order 1: 6-10 hours
|
| 221 |
+
- Order 2: 6-10 hours
|
| 222 |
+
- **Total: 12-20 hours**
|
| 223 |
+
|
| 224 |
+
---
|
| 225 |
+
|
| 226 |
+
## Part 3: Collect & Compare Results
|
| 227 |
+
|
| 228 |
+
### Step 3.1: Run evaluation script
|
| 229 |
+
|
| 230 |
+
After both orders complete:
|
| 231 |
+
|
| 232 |
+
```bash
|
| 233 |
+
source venv_llama_specroute/bin/activate
|
| 234 |
+
|
| 235 |
+
# Compute Continual Learning metrics for Order 1
|
| 236 |
+
python score.py gen_script_superni_order1_llama_specroute gen_script_superni_order1_llama_specroute
|
| 237 |
+
|
| 238 |
+
# Example output:
|
| 239 |
+
# [INFO] base_dir: logs_and_outputs
|
| 240 |
+
# [INFO] run_name: gen_script_superni_order1_llama_specroute
|
| 241 |
+
# === Continual Learning Metrics (Order 1) ===
|
| 242 |
+
# Cl (Current Learning): 0.4523
|
| 243 |
+
# Fgt (Forgetting): 0.1245
|
| 244 |
+
# Fwt (Forward Transfer): 0.4234
|
| 245 |
+
# Bwt (Backward Transfer): 0.0856
|
| 246 |
+
# === Cross-Task Score Matrix ===
|
| 247 |
+
# T1 T2 T3 ... T15
|
| 248 |
+
# Task 1: 0.450 0.000 0.000 0.000
|
| 249 |
+
# Task 2: 0.438 0.462 0.000 0.000
|
| 250 |
+
# ...
|
| 251 |
+
```
|
| 252 |
+
|
| 253 |
+
```bash
|
| 254 |
+
# Compute for Order 2
|
| 255 |
+
python score.py gen_script_superni_order2_llama_specroute gen_script_superni_order2_llama_specroute
|
| 256 |
+
```
|
| 257 |
+
|
| 258 |
+
### Step 3.2: Compare with ROOT GainLoRA Llama baseline
|
| 259 |
+
|
| 260 |
+
Assuming ROOT GainLoRA results exist:
|
| 261 |
+
|
| 262 |
+
```bash
|
| 263 |
+
# Llama GainLoRA InfLoRA Order 1 results (reference)
|
| 264 |
+
python score.py gen_script_superni_order1_llama_gainlora_inflora gen_script_superni_order1_llama_gainlora_inflora
|
| 265 |
+
|
| 266 |
+
echo ""
|
| 267 |
+
echo "=== COMPARISON: SpecRoute vs GainLoRA InfLoRA (Order 1) ==="
|
| 268 |
+
echo "| Metric | GainLoRA | SpecRoute | Delta |"
|
| 269 |
+
echo "|--------|-----------|-----------|-------|"
|
| 270 |
+
# Manually paste numbers from above outputs
|
| 271 |
+
```
|
| 272 |
+
|
| 273 |
+
### Step 3.3: Collect final results into comparison table
|
| 274 |
+
|
| 275 |
+
```bash
|
| 276 |
+
# Optional: Create a CSV summary
|
| 277 |
+
python -c "
|
| 278 |
+
import json
|
| 279 |
+
import os
|
| 280 |
+
|
| 281 |
+
def get_metrics(run_name):
|
| 282 |
+
path = f'logs_and_outputs/{run_name}/outputs/task_order.txt'
|
| 283 |
+
if not os.path.exists(path):
|
| 284 |
+
return None
|
| 285 |
+
# Parse results from score.py output
|
| 286 |
+
# (You can modify this to auto-parse JSON results)
|
| 287 |
+
pass
|
| 288 |
+
|
| 289 |
+
# Create summary
|
| 290 |
+
print('Model,Order,Method,Cl,Fgt,Fwt,Bwt')
|
| 291 |
+
# Fill in from score.py outputs above
|
| 292 |
+
"
|
| 293 |
+
```
|
| 294 |
+
|
| 295 |
+
---
|
| 296 |
+
|
| 297 |
+
## Part 4: Interpreting Results
|
| 298 |
+
|
| 299 |
+
### Continual Learning Metrics
|
| 300 |
+
|
| 301 |
+
| Metric | Definition | What it means |
|
| 302 |
+
|--------|------------|---------------|
|
| 303 |
+
| **Cl** | Average accuracy on all tasks at the final step | Overall final performance. Higher is better. |
|
| 304 |
+
| **Fgt** | Average forgetting on previous tasks after learning all tasks | Catastrophic forgetting measure. Lower is better (ideally 0). |
|
| 305 |
+
| **Fwt** | Average forward transfer (using tasks learned so far) | How much earlier tasks help future tasks. Higher is better. |
|
| 306 |
+
| **Bwt** | Average backward transfer (final task helps previous) | How much current learning damages previous task performance. Lower is better. |
|
| 307 |
+
|
| 308 |
+
### Expected Results (from paper baseline, Table 3)
|
| 309 |
+
|
| 310 |
+
**Llama-2-7B GainLoRA (InfLoRA):**
|
| 311 |
+
- Cl: ~0.45
|
| 312 |
+
- Fgt: ~0.12
|
| 313 |
+
- Fwt: ~0.42
|
| 314 |
+
- Bwt: ~0.09
|
| 315 |
+
|
| 316 |
+
**SpecRoute should achieve similar or better:**
|
| 317 |
+
- Replaces learned routing with parameter-free spectral routing
|
| 318 |
+
- Removes KL distillation + data replay for pure LoRA-only continual learning
|
| 319 |
+
- Same LoRA GPM (task-specific neuron masks)
|
| 320 |
+
|
| 321 |
+
### What to accept/concern:
|
| 322 |
+
|
| 323 |
+
✅ **Good signs:**
|
| 324 |
+
- Cl ≈ GainLoRA baseline (0.42-0.48)
|
| 325 |
+
- Order 1 and Order 2 have similar Cl (robust to task ordering)
|
| 326 |
+
- Fgt is small and stable (< 0.15)
|
| 327 |
+
- Training loss decreases smoothly
|
| 328 |
+
|
| 329 |
+
⚠️ **Warning signs:**
|
| 330 |
+
- Cl much lower (< 0.40) → routing may not be converging
|
| 331 |
+
- Fgt very high (> 0.20) → catastrophic forgetting problem
|
| 332 |
+
- NaN in loss → numerical issue (check bf16 vs fp32)
|
| 333 |
+
- Early divergence → learning rate too high or initialization issue
|
| 334 |
+
|
| 335 |
+
---
|
| 336 |
+
|
| 337 |
+
## Part 5: Quick Troubleshooting
|
| 338 |
+
|
| 339 |
+
### Issue: "CUDA out of memory"
|
| 340 |
+
```bash
|
| 341 |
+
# Reduce batch size in script
|
| 342 |
+
# Change: --per_device_train_batch_size 2
|
| 343 |
+
# To: --per_device_train_batch_size 1
|
| 344 |
+
```
|
| 345 |
+
|
| 346 |
+
### Issue: "score.py not found"
|
| 347 |
+
```bash
|
| 348 |
+
# Make sure you run from improve_gainlora/ directory
|
| 349 |
+
cd /path/to/improve_gainlora
|
| 350 |
+
python score.py ...
|
| 351 |
+
```
|
| 352 |
+
|
| 353 |
+
### Issue: "task_order.txt not found"
|
| 354 |
+
```bash
|
| 355 |
+
# Means tasks didn't complete. Check logs:
|
| 356 |
+
tail -100 run_order1.log | grep -i error
|
| 357 |
+
```
|
| 358 |
+
|
| 359 |
+
### Issue: NaN loss
|
| 360 |
+
```bash
|
| 361 |
+
# SpecRoute training uses bf16 (bfloat16).
|
| 362 |
+
# If server doesn't support bf16, modify src/run_llama.py:
|
| 363 |
+
# Change: --bf16
|
| 364 |
+
# To: --fp32 (but needs more GPU memory)
|
| 365 |
+
```
|
| 366 |
+
|
| 367 |
+
### Issue: Results directory structure empty
|
| 368 |
+
```bash
|
| 369 |
+
# Check if training actually ran for each task:
|
| 370 |
+
ls -la logs_and_outputs/gen_script_superni_order1_llama_specroute/outputs/
|
| 371 |
+
|
| 372 |
+
# You should see: 1-task1572_samsum_summary/, 2-task363_sst2_polarity_classification/, etc.
|
| 373 |
+
```
|
| 374 |
+
|
| 375 |
+
---
|
| 376 |
+
|
| 377 |
+
## Part 6: Advanced Usage
|
| 378 |
+
|
| 379 |
+
### Run on Multi-GPU (if H100 has 8 GPUs)
|
| 380 |
+
|
| 381 |
+
```bash
|
| 382 |
+
# Modify GPU IDs in script or run with:
|
| 383 |
+
deepspeed --include localhost:0,1,2,3 --master_port 49500 src/run_llama.py ...
|
| 384 |
+
|
| 385 |
+
# Or in script, change:
|
| 386 |
+
# deepspeed --include localhost:${1}
|
| 387 |
+
# To specify multiple GPUs: 0,1 or 0,1,2,3
|
| 388 |
+
```
|
| 389 |
+
|
| 390 |
+
### Run Llama-2-13B or Llama-3-8B
|
| 391 |
+
|
| 392 |
+
```bash
|
| 393 |
+
# Simply change model path:
|
| 394 |
+
bash gen_script_superni_order1_llama_specroute.sh 0 meta-llama/Llama-2-13b-hf
|
| 395 |
+
|
| 396 |
+
# Or Llama-3:
|
| 397 |
+
bash gen_script_superni_order1_llama_specroute.sh 0 meta-llama/Llama-3-8b-hf
|
| 398 |
+
|
| 399 |
+
# Note: Llama-3 support not yet implemented (will raise NotImplementedError)
|
| 400 |
+
# Requires creating llama_3_specroute.py (similar steps as llama_specroute.py)
|
| 401 |
+
```
|
| 402 |
+
|
| 403 |
+
### Profile execution time per task
|
| 404 |
+
|
| 405 |
+
```bash
|
| 406 |
+
# Add timestamps to log
|
| 407 |
+
for i in {1..15}; do
|
| 408 |
+
START=$(date +%s)
|
| 409 |
+
# ... run task $i ...
|
| 410 |
+
END=$(date +%s)
|
| 411 |
+
ELAPSED=$((END - START))
|
| 412 |
+
echo "Task $i: $ELAPSED seconds" >> timings.log
|
| 413 |
+
done
|
| 414 |
+
```
|
| 415 |
+
|
| 416 |
+
---
|
| 417 |
+
|
| 418 |
+
## Summary Checklist
|
| 419 |
+
|
| 420 |
+
- [ ] Created isolated venv_llama_specroute/
|
| 421 |
+
- [ ] Installed PyTorch, DeepSpeed, transformers
|
| 422 |
+
- [ ] Verified CUDA availability
|
| 423 |
+
- [ ] Pre-downloaded model weights (Llama-2-7B)
|
| 424 |
+
- [ ] Ran single task test ✓
|
| 425 |
+
- [ ] Ran full Order 1 (6-10 hours)
|
| 426 |
+
- [ ] Ran full Order 2 (6-10 hours)
|
| 427 |
+
- [ ] Computed metrics with score.py for both orders
|
| 428 |
+
- [ ] Compared with GainLoRA baseline
|
| 429 |
+
- [ ] Recorded results in comparison table
|
| 430 |
+
- [ ] Interpreted performance (Cl, Fgt, Fwt, Bwt)
|
| 431 |
+
|
| 432 |
+
---
|
| 433 |
+
|
| 434 |
+
## Files Reference
|
| 435 |
+
|
| 436 |
+
| File | Purpose |
|
| 437 |
+
|------|---------|
|
| 438 |
+
| `venv_llama_specroute/` | Isolated Python environment |
|
| 439 |
+
| `src/llama_specroute.py` | Llama model with spectral routing |
|
| 440 |
+
| `src/cl_trainer_specroute_llama.py` | SpecRoute trainer (GPM + ESA) |
|
| 441 |
+
| `gen_script_superni_order1_llama_specroute.sh` | Task sequence 1 (15 tasks) |
|
| 442 |
+
| `gen_script_superni_order2_llama_specroute.sh` | Task sequence 2 (15 tasks) |
|
| 443 |
+
| `score.py` | Evaluation script (computes Cl, Fgt, etc.) |
|
| 444 |
+
| `logs_and_outputs/gen_script_superni_order{1,2}_llama_specroute/outputs/` | Results per task |
|
| 445 |
+
| `results/comparison_results.md` | Summary table for all methods |
|
| 446 |
+
|
| 447 |
+
---
|
| 448 |
+
|
| 449 |
+
## Questions?
|
| 450 |
+
|
| 451 |
+
Check existing baselines first:
|
| 452 |
+
```bash
|
| 453 |
+
# ROOT GainLoRA InfLoRA results (reference)
|
| 454 |
+
python score.py gen_script_superni_order1_llama_gainlora_inflora gen_script_superni_order1_llama_gainlora_inflora
|
| 455 |
+
|
| 456 |
+
# T5 SpecRoute results (if available)
|
| 457 |
+
python score.py gen_script_superni_order1_t5_specroute gen_script_superni_order1_t5_specroute
|
| 458 |
+
```
|
| 459 |
+
|
| 460 |
+
For theoretical background, see [SPECROUTE_IDEA.md](SPECROUTE_IDEA.md).
|