Upload experiment_log_GLUE.txt
Browse files- examples/experiment_log_GLUE.txt +972 -0
examples/experiment_log_GLUE.txt
ADDED
|
@@ -0,0 +1,972 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
|
| 2 |
+
==============================
|
| 3 |
+
Task: mnli | Model: bert-base-uncased | Method: lora
|
| 4 |
+
==============================
|
| 5 |
+
|
| 6 |
+
Injected standard LoRA adapters via PEFT.
|
| 7 |
+
Trainable params: 1194243 / 110678790 (1.08%)
|
| 8 |
+
Starting training...
|
| 9 |
+
{'loss': 0.7543, 'grad_norm': 7.192684173583984, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
|
| 10 |
+
{'eval_loss': 0.607556939125061, 'eval_accuracy': 0.7472236372898624, 'eval_runtime': 13.7881, 'eval_samples_per_second': 711.847, 'eval_steps_per_second': 22.266, 'epoch': 1.0}
|
| 11 |
+
{'loss': 0.6227, 'grad_norm': 4.976467132568359, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
|
| 12 |
+
{'eval_loss': 0.5613736510276794, 'eval_accuracy': 0.7701477330616403, 'eval_runtime': 13.8073, 'eval_samples_per_second': 710.857, 'eval_steps_per_second': 22.235, 'epoch': 2.0}
|
| 13 |
+
{'loss': 0.5874, 'grad_norm': 6.665809154510498, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
|
| 14 |
+
{'eval_loss': 0.5499060153961182, 'eval_accuracy': 0.7754457463066735, 'eval_runtime': 13.7671, 'eval_samples_per_second': 712.934, 'eval_steps_per_second': 22.3, 'epoch': 3.0}
|
| 15 |
+
{'train_runtime': 3924.5014, 'train_samples_per_second': 300.193, 'train_steps_per_second': 9.381, 'train_loss': 0.6405815247192117, 'epoch': 3.0}
|
| 16 |
+
Training completed in 3924.87 seconds.
|
| 17 |
+
{'eval_loss': 0.5499060153961182, 'eval_accuracy': 0.7754457463066735, 'eval_runtime': 13.7518, 'eval_samples_per_second': 713.725, 'eval_steps_per_second': 22.324, 'epoch': 3.0}
|
| 18 |
+
{'eval_loss': 0.5282740592956543, 'eval_accuracy': 0.78966639544345, 'eval_runtime': 14.1188, 'eval_samples_per_second': 696.375, 'eval_steps_per_second': 21.815, 'epoch': 3.0}
|
| 19 |
+
|
| 20 |
+
=== FINAL RESULTS for mnli | bert-base-uncased | lora ===
|
| 21 |
+
Metric: 0.7754/0.7897
|
| 22 |
+
Training Time: 3924.87 seconds
|
| 23 |
+
|
| 24 |
+
|
| 25 |
+
==============================
|
| 26 |
+
Task: sst2 | Model: bert-base-uncased | Method: lora
|
| 27 |
+
==============================
|
| 28 |
+
|
| 29 |
+
Injected standard LoRA adapters via PEFT.
|
| 30 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 31 |
+
Starting training...
|
| 32 |
+
{'eval_loss': 0.258949875831604, 'eval_accuracy': 0.8956422018348624, 'eval_runtime': 0.7467, 'eval_samples_per_second': 1167.75, 'eval_steps_per_second': 37.497, 'epoch': 1.0}
|
| 33 |
+
{'eval_loss': 0.25339475274086, 'eval_accuracy': 0.9013761467889908, 'eval_runtime': 0.7453, 'eval_samples_per_second': 1169.944, 'eval_steps_per_second': 37.567, 'epoch': 2.0}
|
| 34 |
+
{'eval_loss': 0.24459940195083618, 'eval_accuracy': 0.9013761467889908, 'eval_runtime': 0.7415, 'eval_samples_per_second': 1176.06, 'eval_steps_per_second': 37.763, 'epoch': 3.0}
|
| 35 |
+
{'train_runtime': 390.9411, 'train_samples_per_second': 516.822, 'train_steps_per_second': 16.153, 'train_loss': 0.2664547108509006, 'epoch': 3.0}
|
| 36 |
+
Training completed in 391.31 seconds.
|
| 37 |
+
{'eval_loss': 0.24459940195083618, 'eval_accuracy': 0.9013761467889908, 'eval_runtime': 0.7466, 'eval_samples_per_second': 1168.019, 'eval_steps_per_second': 37.505, 'epoch': 3.0}
|
| 38 |
+
|
| 39 |
+
=== FINAL RESULTS for sst2 | bert-base-uncased | lora ===
|
| 40 |
+
Metric: 0.9014
|
| 41 |
+
Training Time: 391.31 seconds
|
| 42 |
+
|
| 43 |
+
|
| 44 |
+
==============================
|
| 45 |
+
Task: cola | Model: bert-base-uncased | Method: lora
|
| 46 |
+
==============================
|
| 47 |
+
|
| 48 |
+
Injected standard LoRA adapters via PEFT.
|
| 49 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 50 |
+
Starting training...
|
| 51 |
+
{'eval_loss': 0.5983226895332336, 'eval_matthews_correlation': 0.018148342420931135, 'eval_runtime': 0.5265, 'eval_samples_per_second': 1980.947, 'eval_steps_per_second': 62.676, 'epoch': 1.0}
|
| 52 |
+
{'eval_loss': 0.5542184114456177, 'eval_matthews_correlation': 0.17454042413408488, 'eval_runtime': 0.5276, 'eval_samples_per_second': 1976.709, 'eval_steps_per_second': 62.542, 'epoch': 2.0}
|
| 53 |
+
{'eval_loss': 0.5396776795387268, 'eval_matthews_correlation': 0.27356428891843526, 'eval_runtime': 0.5345, 'eval_samples_per_second': 1951.52, 'eval_steps_per_second': 61.745, 'epoch': 3.0}
|
| 54 |
+
{'train_runtime': 47.7869, 'train_samples_per_second': 536.821, 'train_steps_per_second': 16.825, 'train_loss': 0.5679058245758513, 'epoch': 3.0}
|
| 55 |
+
Training completed in 48.14 seconds.
|
| 56 |
+
{'eval_loss': 0.5396776795387268, 'eval_matthews_correlation': 0.27356428891843526, 'eval_runtime': 0.5305, 'eval_samples_per_second': 1966.035, 'eval_steps_per_second': 62.204, 'epoch': 3.0}
|
| 57 |
+
|
| 58 |
+
=== FINAL RESULTS for cola | bert-base-uncased | lora ===
|
| 59 |
+
Metric: 0.2736
|
| 60 |
+
Training Time: 48.14 seconds
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
==============================
|
| 64 |
+
Task: qqp | Model: bert-base-uncased | Method: lora
|
| 65 |
+
==============================
|
| 66 |
+
|
| 67 |
+
Injected standard LoRA adapters via PEFT.
|
| 68 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 69 |
+
Starting training...
|
| 70 |
+
{'loss': 0.4109, 'grad_norm': 3.7838921546936035, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
|
| 71 |
+
{'eval_accuracy': 0.8416027702201335, 'eval_f1': 0.8019054689433308, 'eval_loss': 0.34393781423568726, 'eval_runtime': 52.8326, 'eval_samples_per_second': 765.247, 'eval_steps_per_second': 23.925, 'epoch': 1.0}
|
| 72 |
+
{'loss': 0.3442, 'grad_norm': 6.105409622192383, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
|
| 73 |
+
{'eval_accuracy': 0.8546623794212219, 'eval_f1': 0.8169014084507042, 'eval_loss': 0.32436296343803406, 'eval_runtime': 52.7316, 'eval_samples_per_second': 766.713, 'eval_steps_per_second': 23.97, 'epoch': 2.0}
|
| 74 |
+
{'loss': 0.3261, 'grad_norm': 5.963987350463867, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
|
| 75 |
+
{'eval_accuracy': 0.8592134553549344, 'eval_f1': 0.8201238781443559, 'eval_loss': 0.3147530257701874, 'eval_runtime': 52.7125, 'eval_samples_per_second': 766.991, 'eval_steps_per_second': 23.979, 'epoch': 3.0}
|
| 76 |
+
{'train_runtime': 3482.13, 'train_samples_per_second': 313.468, 'train_steps_per_second': 9.797, 'train_loss': 0.3554784081295529, 'epoch': 3.0}
|
| 77 |
+
Training completed in 3482.43 seconds.
|
| 78 |
+
{'eval_accuracy': 0.8592134553549344, 'eval_f1': 0.8201238781443559, 'eval_loss': 0.3147530257701874, 'eval_runtime': 52.7041, 'eval_samples_per_second': 767.113, 'eval_steps_per_second': 23.983, 'epoch': 3.0}
|
| 79 |
+
|
| 80 |
+
=== FINAL RESULTS for qqp | bert-base-uncased | lora ===
|
| 81 |
+
Metric: 0.8592/0.8201
|
| 82 |
+
Training Time: 3482.43 seconds
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
==============================
|
| 86 |
+
Task: qnli | Model: bert-base-uncased | Method: lora
|
| 87 |
+
==============================
|
| 88 |
+
|
| 89 |
+
Injected standard LoRA adapters via PEFT.
|
| 90 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 91 |
+
Starting training...
|
| 92 |
+
{'eval_loss': 0.3872588574886322, 'eval_accuracy': 0.8266520226981512, 'eval_runtime': 10.2202, 'eval_samples_per_second': 534.531, 'eval_steps_per_second': 16.732, 'epoch': 1.0}
|
| 93 |
+
{'eval_loss': 0.35759007930755615, 'eval_accuracy': 0.8359875526267618, 'eval_runtime': 10.4448, 'eval_samples_per_second': 523.038, 'eval_steps_per_second': 16.372, 'epoch': 2.0}
|
| 94 |
+
{'eval_loss': 0.3487180471420288, 'eval_accuracy': 0.842394288852279, 'eval_runtime': 10.396, 'eval_samples_per_second': 525.488, 'eval_steps_per_second': 16.449, 'epoch': 3.0}
|
| 95 |
+
{'train_runtime': 1341.9095, 'train_samples_per_second': 234.166, 'train_steps_per_second': 7.319, 'train_loss': 0.4522236532942629, 'epoch': 3.0}
|
| 96 |
+
Training completed in 1342.28 seconds.
|
| 97 |
+
{'eval_loss': 0.3487180471420288, 'eval_accuracy': 0.842394288852279, 'eval_runtime': 10.3933, 'eval_samples_per_second': 525.626, 'eval_steps_per_second': 16.453, 'epoch': 3.0}
|
| 98 |
+
|
| 99 |
+
=== FINAL RESULTS for qnli | bert-base-uncased | lora ===
|
| 100 |
+
Metric: 0.8424
|
| 101 |
+
Training Time: 1342.28 seconds
|
| 102 |
+
|
| 103 |
+
|
| 104 |
+
==============================
|
| 105 |
+
Task: rte | Model: bert-base-uncased | Method: lora
|
| 106 |
+
==============================
|
| 107 |
+
|
| 108 |
+
Injected standard LoRA adapters via PEFT.
|
| 109 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 110 |
+
Starting training...
|
| 111 |
+
{'eval_loss': 0.6956374049186707, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.873, 'eval_samples_per_second': 317.305, 'eval_steps_per_second': 10.31, 'epoch': 1.0}
|
| 112 |
+
{'eval_loss': 0.6953310966491699, 'eval_accuracy': 0.48375451263537905, 'eval_runtime': 0.8871, 'eval_samples_per_second': 312.258, 'eval_steps_per_second': 10.146, 'epoch': 2.0}
|
| 113 |
+
{'eval_loss': 0.6961230039596558, 'eval_accuracy': 0.47653429602888087, 'eval_runtime': 0.8538, 'eval_samples_per_second': 324.414, 'eval_steps_per_second': 10.541, 'epoch': 3.0}
|
| 114 |
+
{'train_runtime': 56.6362, 'train_samples_per_second': 131.894, 'train_steps_per_second': 4.132, 'train_loss': 0.6990308843107305, 'epoch': 3.0}
|
| 115 |
+
Training completed in 57.00 seconds.
|
| 116 |
+
{'eval_loss': 0.6953310966491699, 'eval_accuracy': 0.48375451263537905, 'eval_runtime': 0.8862, 'eval_samples_per_second': 312.561, 'eval_steps_per_second': 10.155, 'epoch': 3.0}
|
| 117 |
+
|
| 118 |
+
=== FINAL RESULTS for rte | bert-base-uncased | lora ===
|
| 119 |
+
Metric: 0.4838
|
| 120 |
+
Training Time: 57.00 seconds
|
| 121 |
+
|
| 122 |
+
|
| 123 |
+
==============================
|
| 124 |
+
Task: mrpc | Model: bert-base-uncased | Method: lora
|
| 125 |
+
==============================
|
| 126 |
+
|
| 127 |
+
Injected standard LoRA adapters via PEFT.
|
| 128 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 129 |
+
Starting training...
|
| 130 |
+
{'eval_loss': 0.6082455515861511, 'eval_accuracy': 0.6862745098039216, 'eval_f1': 0.8134110787172012, 'eval_runtime': 0.6508, 'eval_samples_per_second': 626.892, 'eval_steps_per_second': 19.975, 'epoch': 1.0}
|
| 131 |
+
{'eval_loss': 0.5976766347885132, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.65, 'eval_samples_per_second': 627.713, 'eval_steps_per_second': 20.001, 'epoch': 2.0}
|
| 132 |
+
{'eval_loss': 0.5930073857307434, 'eval_accuracy': 0.6862745098039216, 'eval_f1': 0.8134110787172012, 'eval_runtime': 0.6529, 'eval_samples_per_second': 624.911, 'eval_steps_per_second': 19.911, 'epoch': 3.0}
|
| 133 |
+
{'train_runtime': 43.6258, 'train_samples_per_second': 252.236, 'train_steps_per_second': 7.908, 'train_loss': 0.6178461931753849, 'epoch': 3.0}
|
| 134 |
+
Training completed in 43.98 seconds.
|
| 135 |
+
{'eval_loss': 0.5930073857307434, 'eval_accuracy': 0.6862745098039216, 'eval_f1': 0.8134110787172012, 'eval_runtime': 0.6498, 'eval_samples_per_second': 627.85, 'eval_steps_per_second': 20.005, 'epoch': 3.0}
|
| 136 |
+
|
| 137 |
+
=== FINAL RESULTS for mrpc | bert-base-uncased | lora ===
|
| 138 |
+
Metric: 0.6863
|
| 139 |
+
Training Time: 43.98 seconds
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
==============================
|
| 143 |
+
Task: stsb | Model: bert-base-uncased | Method: lora
|
| 144 |
+
==============================
|
| 145 |
+
|
| 146 |
+
Injected standard LoRA adapters via PEFT.
|
| 147 |
+
Trainable params: 1192705 / 110675714 (1.08%)
|
| 148 |
+
Starting training...
|
| 149 |
+
{'eval_loss': 2.186974048614502, 'eval_pearson': 0.4989167054020985, 'eval_spearmanr': 0.5444894161992337, 'eval_combined_score': 0.5217030608006661, 'eval_runtime': 1.5493, 'eval_samples_per_second': 968.18, 'eval_steps_per_second': 30.336, 'epoch': 1.0}
|
| 150 |
+
{'eval_loss': 1.4939329624176025, 'eval_pearson': 0.6610799440013643, 'eval_spearmanr': 0.6714651442830666, 'eval_combined_score': 0.6662725441422155, 'eval_runtime': 1.5413, 'eval_samples_per_second': 973.222, 'eval_steps_per_second': 30.494, 'epoch': 2.0}
|
| 151 |
+
{'eval_loss': 1.2076021432876587, 'eval_pearson': 0.7196282697199895, 'eval_spearmanr': 0.7309657778188898, 'eval_combined_score': 0.7252970237694396, 'eval_runtime': 1.5503, 'eval_samples_per_second': 967.559, 'eval_steps_per_second': 30.317, 'epoch': 3.0}
|
| 152 |
+
{'train_runtime': 62.1314, 'train_samples_per_second': 277.589, 'train_steps_per_second': 8.691, 'train_loss': 2.3513389304832177, 'epoch': 3.0}
|
| 153 |
+
Training completed in 62.46 seconds.
|
| 154 |
+
{'eval_loss': 1.2076021432876587, 'eval_pearson': 0.7196282697199895, 'eval_spearmanr': 0.7309657778188898, 'eval_combined_score': 0.7252970237694396, 'eval_runtime': 1.5501, 'eval_samples_per_second': 967.653, 'eval_steps_per_second': 30.32, 'epoch': 3.0}
|
| 155 |
+
|
| 156 |
+
=== FINAL RESULTS for stsb | bert-base-uncased | lora ===
|
| 157 |
+
Metric: 0.7253
|
| 158 |
+
Training Time: 62.46 seconds
|
| 159 |
+
|
| 160 |
+
|
| 161 |
+
==============================
|
| 162 |
+
Task: mnli | Model: bert-base-uncased | Method: diff_lora
|
| 163 |
+
==============================
|
| 164 |
+
|
| 165 |
+
Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
|
| 166 |
+
Trainable params: 2383933 / 111793984 (2.13%)
|
| 167 |
+
Starting training...
|
| 168 |
+
{'loss': 0.6947, 'grad_norm': 15.653863906860352, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
|
| 169 |
+
{'eval_loss': 0.5447636246681213, 'eval_accuracy': 0.7818644931227713, 'eval_runtime': 16.5856, 'eval_samples_per_second': 591.779, 'eval_steps_per_second': 18.51, 'epoch': 1.0}
|
| 170 |
+
{'loss': 0.5561, 'grad_norm': 11.298938751220703, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
|
| 171 |
+
{'eval_loss': 0.514032244682312, 'eval_accuracy': 0.800509424350484, 'eval_runtime': 16.5827, 'eval_samples_per_second': 591.882, 'eval_steps_per_second': 18.513, 'epoch': 2.0}
|
| 172 |
+
{'loss': 0.5145, 'grad_norm': 18.323087692260742, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
|
| 173 |
+
{'eval_loss': 0.5049977898597717, 'eval_accuracy': 0.8046867040244524, 'eval_runtime': 16.6301, 'eval_samples_per_second': 590.194, 'eval_steps_per_second': 18.46, 'epoch': 3.0}
|
| 174 |
+
{'train_runtime': 4639.8539, 'train_samples_per_second': 253.91, 'train_steps_per_second': 7.935, 'train_loss': 0.5715459817805947, 'epoch': 3.0}
|
| 175 |
+
Training completed in 4640.21 seconds.
|
| 176 |
+
{'eval_loss': 0.5049977898597717, 'eval_accuracy': 0.8046867040244524, 'eval_runtime': 16.5868, 'eval_samples_per_second': 591.735, 'eval_steps_per_second': 18.509, 'epoch': 3.0}
|
| 177 |
+
{'eval_loss': 0.48250600695610046, 'eval_accuracy': 0.8116354759967453, 'eval_runtime': 17.0429, 'eval_samples_per_second': 576.898, 'eval_steps_per_second': 18.072, 'epoch': 3.0}
|
| 178 |
+
|
| 179 |
+
=== FINAL RESULTS for mnli | bert-base-uncased | diff_lora ===
|
| 180 |
+
Metric: 0.8047/0.8116
|
| 181 |
+
Training Time: 4640.21 seconds
|
| 182 |
+
|
| 183 |
+
|
| 184 |
+
==============================
|
| 185 |
+
Task: sst2 | Model: bert-base-uncased | Method: diff_lora
|
| 186 |
+
==============================
|
| 187 |
+
|
| 188 |
+
Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
|
| 189 |
+
Trainable params: 2383933 / 111793215 (2.13%)
|
| 190 |
+
Starting training...
|
| 191 |
+
{'eval_loss': 0.23795804381370544, 'eval_accuracy': 0.911697247706422, 'eval_runtime': 0.8555, 'eval_samples_per_second': 1019.265, 'eval_steps_per_second': 32.729, 'epoch': 1.0}
|
| 192 |
+
{'eval_loss': 0.2600213289260864, 'eval_accuracy': 0.9139908256880734, 'eval_runtime': 0.8524, 'eval_samples_per_second': 1022.961, 'eval_steps_per_second': 32.847, 'epoch': 2.0}
|
| 193 |
+
{'eval_loss': 0.2648456394672394, 'eval_accuracy': 0.9139908256880734, 'eval_runtime': 0.88, 'eval_samples_per_second': 990.875, 'eval_steps_per_second': 31.817, 'epoch': 3.0}
|
| 194 |
+
{'train_runtime': 427.5234, 'train_samples_per_second': 472.599, 'train_steps_per_second': 14.771, 'train_loss': 0.2100713710226148, 'epoch': 3.0}
|
| 195 |
+
Training completed in 427.89 seconds.
|
| 196 |
+
{'eval_loss': 0.23795804381370544, 'eval_accuracy': 0.911697247706422, 'eval_runtime': 0.8749, 'eval_samples_per_second': 996.654, 'eval_steps_per_second': 32.003, 'epoch': 3.0}
|
| 197 |
+
|
| 198 |
+
=== FINAL RESULTS for sst2 | bert-base-uncased | diff_lora ===
|
| 199 |
+
Metric: 0.9117
|
| 200 |
+
Training Time: 427.89 seconds
|
| 201 |
+
|
| 202 |
+
|
| 203 |
+
==============================
|
| 204 |
+
Task: cola | Model: bert-base-uncased | Method: diff_lora
|
| 205 |
+
==============================
|
| 206 |
+
|
| 207 |
+
Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
|
| 208 |
+
Trainable params: 2383933 / 111793215 (2.13%)
|
| 209 |
+
Starting training...
|
| 210 |
+
{'eval_loss': 0.5782420039176941, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.6416, 'eval_samples_per_second': 1625.645, 'eval_steps_per_second': 51.435, 'epoch': 1.0}
|
| 211 |
+
{'eval_loss': 0.5364716649055481, 'eval_matthews_correlation': 0.3429695650358, 'eval_runtime': 0.6568, 'eval_samples_per_second': 1587.909, 'eval_steps_per_second': 50.241, 'epoch': 2.0}
|
| 212 |
+
{'eval_loss': 0.5494520664215088, 'eval_matthews_correlation': 0.3558006877385648, 'eval_runtime': 0.624, 'eval_samples_per_second': 1671.599, 'eval_steps_per_second': 52.889, 'epoch': 3.0}
|
| 213 |
+
{'train_runtime': 52.7721, 'train_samples_per_second': 486.109, 'train_steps_per_second': 15.235, 'train_loss': 0.536839651231149, 'epoch': 3.0}
|
| 214 |
+
Training completed in 53.13 seconds.
|
| 215 |
+
{'eval_loss': 0.5364716649055481, 'eval_matthews_correlation': 0.3429695650358, 'eval_runtime': 0.6229, 'eval_samples_per_second': 1674.314, 'eval_steps_per_second': 52.974, 'epoch': 3.0}
|
| 216 |
+
|
| 217 |
+
=== FINAL RESULTS for cola | bert-base-uncased | diff_lora ===
|
| 218 |
+
Metric: 0.3430
|
| 219 |
+
Training Time: 53.13 seconds
|
| 220 |
+
|
| 221 |
+
|
| 222 |
+
==============================
|
| 223 |
+
Task: qqp | Model: bert-base-uncased | Method: diff_lora
|
| 224 |
+
==============================
|
| 225 |
+
|
| 226 |
+
Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
|
| 227 |
+
Trainable params: 2383933 / 111793215 (2.13%)
|
| 228 |
+
Starting training...
|
| 229 |
+
{'loss': 0.3753, 'grad_norm': 9.991658210754395, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
|
| 230 |
+
{'eval_accuracy': 0.8646549591887213, 'eval_f1': 0.8212699242226287, 'eval_loss': 0.3013371527194977, 'eval_runtime': 51.0312, 'eval_samples_per_second': 792.26, 'eval_steps_per_second': 24.769, 'epoch': 1.0}
|
| 231 |
+
{'loss': 0.3016, 'grad_norm': 9.778715133666992, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
|
| 232 |
+
{'eval_accuracy': 0.8736829087311403, 'eval_f1': 0.8335343394504384, 'eval_loss': 0.28591614961624146, 'eval_runtime': 51.0348, 'eval_samples_per_second': 792.204, 'eval_steps_per_second': 24.767, 'epoch': 2.0}
|
| 233 |
+
{'loss': 0.2761, 'grad_norm': 9.68469524383545, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
|
| 234 |
+
{'eval_accuracy': 0.8792233489982686, 'eval_f1': 0.8404821796086375, 'eval_loss': 0.2785017192363739, 'eval_runtime': 51.8869, 'eval_samples_per_second': 779.195, 'eval_steps_per_second': 24.361, 'epoch': 3.0}
|
| 235 |
+
{'train_runtime': 3478.2186, 'train_samples_per_second': 313.821, 'train_steps_per_second': 9.808, 'train_loss': 0.31152516229379196, 'epoch': 3.0}
|
| 236 |
+
Training completed in 3478.58 seconds.
|
| 237 |
+
{'eval_accuracy': 0.8792233489982686, 'eval_f1': 0.8404821796086375, 'eval_loss': 0.2785017192363739, 'eval_runtime': 51.969, 'eval_samples_per_second': 777.963, 'eval_steps_per_second': 24.322, 'epoch': 3.0}
|
| 238 |
+
|
| 239 |
+
=== FINAL RESULTS for qqp | bert-base-uncased | diff_lora ===
|
| 240 |
+
Metric: 0.8792/0.8405
|
| 241 |
+
Training Time: 3478.58 seconds
|
| 242 |
+
|
| 243 |
+
|
| 244 |
+
==============================
|
| 245 |
+
Task: qnli | Model: bert-base-uncased | Method: diff_lora
|
| 246 |
+
==============================
|
| 247 |
+
|
| 248 |
+
Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
|
| 249 |
+
Trainable params: 2383933 / 111793215 (2.13%)
|
| 250 |
+
Starting training...
|
| 251 |
+
{'eval_loss': 0.3488791286945343, 'eval_accuracy': 0.8451400329489291, 'eval_runtime': 10.1741, 'eval_samples_per_second': 536.954, 'eval_steps_per_second': 16.807, 'epoch': 1.0}
|
| 252 |
+
{'eval_loss': 0.3232109546661377, 'eval_accuracy': 0.8617975471352737, 'eval_runtime': 10.278, 'eval_samples_per_second': 531.526, 'eval_steps_per_second': 16.638, 'epoch': 2.0}
|
| 253 |
+
{'eval_loss': 0.3128054440021515, 'eval_accuracy': 0.8652754896576972, 'eval_runtime': 10.3058, 'eval_samples_per_second': 530.091, 'eval_steps_per_second': 16.593, 'epoch': 3.0}
|
| 254 |
+
{'train_runtime': 1331.802, 'train_samples_per_second': 235.943, 'train_steps_per_second': 7.375, 'train_loss': 0.40391058000375435, 'epoch': 3.0}
|
| 255 |
+
Training completed in 1332.17 seconds.
|
| 256 |
+
{'eval_loss': 0.3128054440021515, 'eval_accuracy': 0.8652754896576972, 'eval_runtime': 10.3119, 'eval_samples_per_second': 529.777, 'eval_steps_per_second': 16.583, 'epoch': 3.0}
|
| 257 |
+
|
| 258 |
+
=== FINAL RESULTS for qnli | bert-base-uncased | diff_lora ===
|
| 259 |
+
Metric: 0.8653
|
| 260 |
+
Training Time: 1332.17 seconds
|
| 261 |
+
|
| 262 |
+
|
| 263 |
+
==============================
|
| 264 |
+
Task: rte | Model: bert-base-uncased | Method: diff_lora
|
| 265 |
+
==============================
|
| 266 |
+
|
| 267 |
+
Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
|
| 268 |
+
Trainable params: 2383933 / 111793215 (2.13%)
|
| 269 |
+
Starting training...
|
| 270 |
+
{'eval_loss': 0.6908643245697021, 'eval_accuracy': 0.5126353790613718, 'eval_runtime': 0.8397, 'eval_samples_per_second': 329.895, 'eval_steps_per_second': 10.719, 'epoch': 1.0}
|
| 271 |
+
{'eval_loss': 0.6881587505340576, 'eval_accuracy': 0.5379061371841155, 'eval_runtime': 0.8412, 'eval_samples_per_second': 329.302, 'eval_steps_per_second': 10.699, 'epoch': 2.0}
|
| 272 |
+
{'eval_loss': 0.6893179416656494, 'eval_accuracy': 0.5451263537906137, 'eval_runtime': 0.8748, 'eval_samples_per_second': 316.636, 'eval_steps_per_second': 10.288, 'epoch': 3.0}
|
| 273 |
+
{'train_runtime': 57.4855, 'train_samples_per_second': 129.946, 'train_steps_per_second': 4.071, 'train_loss': 0.6872217553293604, 'epoch': 3.0}
|
| 274 |
+
Training completed in 57.84 seconds.
|
| 275 |
+
{'eval_loss': 0.6881587505340576, 'eval_accuracy': 0.5379061371841155, 'eval_runtime': 0.8573, 'eval_samples_per_second': 323.112, 'eval_steps_per_second': 10.498, 'epoch': 3.0}
|
| 276 |
+
|
| 277 |
+
=== FINAL RESULTS for rte | bert-base-uncased | diff_lora ===
|
| 278 |
+
Metric: 0.5379
|
| 279 |
+
Training Time: 57.84 seconds
|
| 280 |
+
|
| 281 |
+
|
| 282 |
+
==============================
|
| 283 |
+
Task: mrpc | Model: bert-base-uncased | Method: diff_lora
|
| 284 |
+
==============================
|
| 285 |
+
|
| 286 |
+
Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
|
| 287 |
+
Trainable params: 2383933 / 111793215 (2.13%)
|
| 288 |
+
Starting training...
|
| 289 |
+
{'eval_loss': 0.6013302803039551, 'eval_accuracy': 0.7009803921568627, 'eval_f1': 0.8189910979228486, 'eval_runtime': 0.6298, 'eval_samples_per_second': 647.787, 'eval_steps_per_second': 20.64, 'epoch': 1.0}
|
| 290 |
+
{'eval_loss': 0.5768405795097351, 'eval_accuracy': 0.7107843137254902, 'eval_f1': 0.8233532934131736, 'eval_runtime': 0.6327, 'eval_samples_per_second': 644.858, 'eval_steps_per_second': 20.547, 'epoch': 2.0}
|
| 291 |
+
{'eval_loss': 0.5735046863555908, 'eval_accuracy': 0.7009803921568627, 'eval_f1': 0.8157099697885196, 'eval_runtime': 0.6039, 'eval_samples_per_second': 675.597, 'eval_steps_per_second': 21.526, 'epoch': 3.0}
|
| 292 |
+
{'train_runtime': 43.087, 'train_samples_per_second': 255.391, 'train_steps_per_second': 8.007, 'train_loss': 0.5828715448794157, 'epoch': 3.0}
|
| 293 |
+
Training completed in 43.44 seconds.
|
| 294 |
+
{'eval_loss': 0.5735046863555908, 'eval_accuracy': 0.7009803921568627, 'eval_f1': 0.8157099697885196, 'eval_runtime': 0.636, 'eval_samples_per_second': 641.462, 'eval_steps_per_second': 20.439, 'epoch': 3.0}
|
| 295 |
+
|
| 296 |
+
=== FINAL RESULTS for mrpc | bert-base-uncased | diff_lora ===
|
| 297 |
+
Metric: 0.7010
|
| 298 |
+
Training Time: 43.44 seconds
|
| 299 |
+
|
| 300 |
+
|
| 301 |
+
==============================
|
| 302 |
+
Task: stsb | Model: bert-base-uncased | Method: diff_lora
|
| 303 |
+
==============================
|
| 304 |
+
|
| 305 |
+
Injected fused DiffLoRA adapters with rank 8 (ratio=1.0).
|
| 306 |
+
Trainable params: 2383933 / 111792446 (2.13%)
|
| 307 |
+
Starting training...
|
| 308 |
+
{'eval_loss': 2.023221015930176, 'eval_pearson': 0.43274608571597944, 'eval_spearmanr': 0.38884918971151733, 'eval_combined_score': 0.4107976377137484, 'eval_runtime': 1.5082, 'eval_samples_per_second': 994.574, 'eval_steps_per_second': 31.163, 'epoch': 1.0}
|
| 309 |
+
{'eval_loss': 1.104512333869934, 'eval_pearson': 0.7544793686109483, 'eval_spearmanr': 0.7705548315901775, 'eval_combined_score': 0.7625171001005628, 'eval_runtime': 1.5033, 'eval_samples_per_second': 997.79, 'eval_steps_per_second': 31.264, 'epoch': 2.0}
|
| 310 |
+
{'eval_loss': 0.9440999627113342, 'eval_pearson': 0.7774580882446339, 'eval_spearmanr': 0.7833591644633611, 'eval_combined_score': 0.7804086263539975, 'eval_runtime': 1.4677, 'eval_samples_per_second': 1021.974, 'eval_steps_per_second': 32.022, 'epoch': 3.0}
|
| 311 |
+
{'train_runtime': 60.8312, 'train_samples_per_second': 283.522, 'train_steps_per_second': 8.877, 'train_loss': 1.575874498155382, 'epoch': 3.0}
|
| 312 |
+
Training completed in 61.19 seconds.
|
| 313 |
+
{'eval_loss': 0.9440999627113342, 'eval_pearson': 0.7774580882446339, 'eval_spearmanr': 0.7833591644633611, 'eval_combined_score': 0.7804086263539975, 'eval_runtime': 1.5024, 'eval_samples_per_second': 998.372, 'eval_steps_per_second': 31.282, 'epoch': 3.0}
|
| 314 |
+
|
| 315 |
+
=== FINAL RESULTS for stsb | bert-base-uncased | diff_lora ===
|
| 316 |
+
Metric: 0.7804
|
| 317 |
+
Training Time: 61.19 seconds
|
| 318 |
+
|
| 319 |
+
|
| 320 |
+
==============================
|
| 321 |
+
Task: mnli | Model: bert-base-uncased | Method: adalora
|
| 322 |
+
==============================
|
| 323 |
+
|
| 324 |
+
Injected AdaLoRA adapters via PEFT.
|
| 325 |
+
Trainable params: 1790943 / 111275551 (1.61%)
|
| 326 |
+
Starting training...
|
| 327 |
+
{'loss': 1.1374, 'grad_norm': 3.505448579788208, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
|
| 328 |
+
{'eval_loss': 1.0381470918655396, 'eval_accuracy': 0.47009679062659193, 'eval_runtime': 19.6674, 'eval_samples_per_second': 499.05, 'eval_steps_per_second': 15.61, 'epoch': 1.0}
|
| 329 |
+
{'loss': 0.9977, 'grad_norm': 1.374243140220642, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
|
| 330 |
+
{'eval_loss': 0.8795942664146423, 'eval_accuracy': 0.5992868059093225, 'eval_runtime': 19.9383, 'eval_samples_per_second': 492.269, 'eval_steps_per_second': 15.398, 'epoch': 2.0}
|
| 331 |
+
{'loss': 0.8936, 'grad_norm': 2.785215377807617, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
|
| 332 |
+
{'eval_loss': 0.8512938618659973, 'eval_accuracy': 0.6146714212939378, 'eval_runtime': 19.8935, 'eval_samples_per_second': 493.377, 'eval_steps_per_second': 15.432, 'epoch': 3.0}
|
| 333 |
+
{'train_runtime': 5589.1918, 'train_samples_per_second': 210.783, 'train_steps_per_second': 6.587, 'train_loss': 0.9832181039451258, 'epoch': 3.0}
|
| 334 |
+
Training completed in 5589.56 seconds.
|
| 335 |
+
{'eval_loss': 0.8512938618659973, 'eval_accuracy': 0.6146714212939378, 'eval_runtime': 19.9172, 'eval_samples_per_second': 492.789, 'eval_steps_per_second': 15.414, 'epoch': 3.0}
|
| 336 |
+
{'eval_loss': 0.8136247396469116, 'eval_accuracy': 0.6382221318144833, 'eval_runtime': 20.3656, 'eval_samples_per_second': 482.775, 'eval_steps_per_second': 15.124, 'epoch': 3.0}
|
| 337 |
+
|
| 338 |
+
=== FINAL RESULTS for mnli | bert-base-uncased | adalora ===
|
| 339 |
+
Metric: 0.6147/0.6382
|
| 340 |
+
Training Time: 5589.56 seconds
|
| 341 |
+
|
| 342 |
+
|
| 343 |
+
==============================
|
| 344 |
+
Task: sst2 | Model: bert-base-uncased | Method: adalora
|
| 345 |
+
==============================
|
| 346 |
+
|
| 347 |
+
Injected AdaLoRA adapters via PEFT.
|
| 348 |
+
Trainable params: 1790174 / 111274013 (1.61%)
|
| 349 |
+
Starting training...
|
| 350 |
+
{'eval_loss': 0.6694198846817017, 'eval_accuracy': 0.5481651376146789, 'eval_runtime': 1.1894, 'eval_samples_per_second': 733.137, 'eval_steps_per_second': 23.541, 'epoch': 1.0}
|
| 351 |
+
{'eval_loss': 0.6387322545051575, 'eval_accuracy': 0.6307339449541285, 'eval_runtime': 1.1581, 'eval_samples_per_second': 752.97, 'eval_steps_per_second': 24.178, 'epoch': 2.0}
|
| 352 |
+
{'eval_loss': 0.6214661002159119, 'eval_accuracy': 0.6376146788990825, 'eval_runtime': 1.1568, 'eval_samples_per_second': 753.8, 'eval_steps_per_second': 24.205, 'epoch': 3.0}
|
| 353 |
+
{'train_runtime': 590.0367, 'train_samples_per_second': 342.431, 'train_steps_per_second': 10.703, 'train_loss': 0.7303827692003168, 'epoch': 3.0}
|
| 354 |
+
Training completed in 590.41 seconds.
|
| 355 |
+
{'eval_loss': 0.6214661002159119, 'eval_accuracy': 0.6376146788990825, 'eval_runtime': 1.1533, 'eval_samples_per_second': 756.093, 'eval_steps_per_second': 24.278, 'epoch': 3.0}
|
| 356 |
+
|
| 357 |
+
=== FINAL RESULTS for sst2 | bert-base-uncased | adalora ===
|
| 358 |
+
Metric: 0.6376
|
| 359 |
+
Training Time: 590.41 seconds
|
| 360 |
+
|
| 361 |
+
|
| 362 |
+
==============================
|
| 363 |
+
Task: cola | Model: bert-base-uncased | Method: adalora
|
| 364 |
+
==============================
|
| 365 |
+
|
| 366 |
+
Injected AdaLoRA adapters via PEFT.
|
| 367 |
+
Trainable params: 1790174 / 111274013 (1.61%)
|
| 368 |
+
Starting training...
|
| 369 |
+
{'eval_loss': 1.3985247611999512, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.9098, 'eval_samples_per_second': 1146.385, 'eval_steps_per_second': 36.271, 'epoch': 1.0}
|
| 370 |
+
{'eval_loss': 1.262640357017517, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.9196, 'eval_samples_per_second': 1134.13, 'eval_steps_per_second': 35.883, 'epoch': 2.0}
|
| 371 |
+
{'eval_loss': 1.2143875360488892, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 1.0264, 'eval_samples_per_second': 1016.217, 'eval_steps_per_second': 32.153, 'epoch': 3.0}
|
| 372 |
+
{'train_runtime': 72.2854, 'train_samples_per_second': 354.885, 'train_steps_per_second': 11.123, 'train_loss': 1.3467446702036692, 'epoch': 3.0}
|
| 373 |
+
Training completed in 72.58 seconds.
|
| 374 |
+
{'eval_loss': 1.2143875360488892, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.9954, 'eval_samples_per_second': 1047.853, 'eval_steps_per_second': 33.154, 'epoch': 3.0}
|
| 375 |
+
|
| 376 |
+
=== FINAL RESULTS for cola | bert-base-uncased | adalora ===
|
| 377 |
+
Metric: -0.0207
|
| 378 |
+
Training Time: 72.58 seconds
|
| 379 |
+
|
| 380 |
+
|
| 381 |
+
==============================
|
| 382 |
+
Task: qqp | Model: bert-base-uncased | Method: adalora
|
| 383 |
+
==============================
|
| 384 |
+
|
| 385 |
+
Injected AdaLoRA adapters via PEFT.
|
| 386 |
+
Trainable params: 1790174 / 111274013 (1.61%)
|
| 387 |
+
Starting training...
|
| 388 |
+
{'loss': 0.6296, 'grad_norm': 2.261782646179199, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
|
| 389 |
+
{'eval_accuracy': 0.7579520158298293, 'eval_f1': 0.7092691622103386, 'eval_loss': 0.47327563166618347, 'eval_runtime': 65.3962, 'eval_samples_per_second': 618.232, 'eval_steps_per_second': 19.328, 'epoch': 1.0}
|
| 390 |
+
{'loss': 0.4677, 'grad_norm': 1.627873182296753, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
|
| 391 |
+
{'eval_accuracy': 0.7813999505317833, 'eval_f1': 0.7402574501851525, 'eval_loss': 0.4382329285144806, 'eval_runtime': 65.3685, 'eval_samples_per_second': 618.493, 'eval_steps_per_second': 19.337, 'epoch': 2.0}
|
| 392 |
+
{'loss': 0.445, 'grad_norm': 1.8313876390457153, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
|
| 393 |
+
{'eval_accuracy': 0.7876576799406382, 'eval_f1': 0.745002524727478, 'eval_loss': 0.4278358221054077, 'eval_runtime': 65.368, 'eval_samples_per_second': 618.498, 'eval_steps_per_second': 19.337, 'epoch': 3.0}
|
| 394 |
+
{'train_runtime': 4354.7417, 'train_samples_per_second': 250.655, 'train_steps_per_second': 7.834, 'train_loss': 0.5051169407328218, 'epoch': 3.0}
|
| 395 |
+
Training completed in 4355.09 seconds.
|
| 396 |
+
{'eval_accuracy': 0.7876576799406382, 'eval_f1': 0.745002524727478, 'eval_loss': 0.4278358221054077, 'eval_runtime': 65.3902, 'eval_samples_per_second': 618.289, 'eval_steps_per_second': 19.33, 'epoch': 3.0}
|
| 397 |
+
|
| 398 |
+
=== FINAL RESULTS for qqp | bert-base-uncased | adalora ===
|
| 399 |
+
Metric: 0.7877/0.7450
|
| 400 |
+
Training Time: 4355.09 seconds
|
| 401 |
+
|
| 402 |
+
|
| 403 |
+
==============================
|
| 404 |
+
Task: qnli | Model: bert-base-uncased | Method: adalora
|
| 405 |
+
==============================
|
| 406 |
+
|
| 407 |
+
Injected AdaLoRA adapters via PEFT.
|
| 408 |
+
Trainable params: 1790174 / 111274013 (1.61%)
|
| 409 |
+
Starting training...
|
| 410 |
+
{'eval_loss': 0.6812258958816528, 'eval_accuracy': 0.5740435658063335, 'eval_runtime': 12.2629, 'eval_samples_per_second': 445.492, 'eval_steps_per_second': 13.945, 'epoch': 1.0}
|
| 411 |
+
{'eval_loss': 0.6761545538902283, 'eval_accuracy': 0.5824638477027274, 'eval_runtime': 12.015, 'eval_samples_per_second': 454.683, 'eval_steps_per_second': 14.232, 'epoch': 2.0}
|
| 412 |
+
{'eval_loss': 0.6743924021720886, 'eval_accuracy': 0.5850265421929343, 'eval_runtime': 12.1905, 'eval_samples_per_second': 448.137, 'eval_steps_per_second': 14.027, 'epoch': 3.0}
|
| 413 |
+
{'train_runtime': 1599.385, 'train_samples_per_second': 196.469, 'train_steps_per_second': 6.141, 'train_loss': 0.7361293226462279, 'epoch': 3.0}
|
| 414 |
+
Training completed in 1599.73 seconds.
|
| 415 |
+
{'eval_loss': 0.6743924021720886, 'eval_accuracy': 0.5850265421929343, 'eval_runtime': 12.2374, 'eval_samples_per_second': 446.417, 'eval_steps_per_second': 13.974, 'epoch': 3.0}
|
| 416 |
+
|
| 417 |
+
=== FINAL RESULTS for qnli | bert-base-uncased | adalora ===
|
| 418 |
+
Metric: 0.5850
|
| 419 |
+
Training Time: 1599.73 seconds
|
| 420 |
+
|
| 421 |
+
|
| 422 |
+
==============================
|
| 423 |
+
Task: rte | Model: bert-base-uncased | Method: adalora
|
| 424 |
+
==============================
|
| 425 |
+
|
| 426 |
+
Injected AdaLoRA adapters via PEFT.
|
| 427 |
+
Trainable params: 1790174 / 111274013 (1.61%)
|
| 428 |
+
Starting training...
|
| 429 |
+
{'eval_loss': 1.662428379058838, 'eval_accuracy': 0.4584837545126354, 'eval_runtime': 0.9804, 'eval_samples_per_second': 282.533, 'eval_steps_per_second': 9.18, 'epoch': 1.0}
|
| 430 |
+
{'eval_loss': 1.6126227378845215, 'eval_accuracy': 0.47653429602888087, 'eval_runtime': 0.9767, 'eval_samples_per_second': 283.616, 'eval_steps_per_second': 9.215, 'epoch': 2.0}
|
| 431 |
+
{'eval_loss': 1.5964787006378174, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.964, 'eval_samples_per_second': 287.349, 'eval_steps_per_second': 9.336, 'epoch': 3.0}
|
| 432 |
+
{'train_runtime': 64.4909, 'train_samples_per_second': 115.83, 'train_steps_per_second': 3.628, 'train_loss': 1.6494870960202992, 'epoch': 3.0}
|
| 433 |
+
Training completed in 64.83 seconds.
|
| 434 |
+
{'eval_loss': 1.5964787006378174, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.9452, 'eval_samples_per_second': 293.057, 'eval_steps_per_second': 9.522, 'epoch': 3.0}
|
| 435 |
+
|
| 436 |
+
=== FINAL RESULTS for rte | bert-base-uncased | adalora ===
|
| 437 |
+
Metric: 0.4693
|
| 438 |
+
Training Time: 64.83 seconds
|
| 439 |
+
|
| 440 |
+
|
| 441 |
+
==============================
|
| 442 |
+
Task: mrpc | Model: bert-base-uncased | Method: adalora
|
| 443 |
+
==============================
|
| 444 |
+
|
| 445 |
+
Injected AdaLoRA adapters via PEFT.
|
| 446 |
+
Trainable params: 1790174 / 111274013 (1.61%)
|
| 447 |
+
Starting training...
|
| 448 |
+
{'eval_loss': 1.5392217636108398, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7702, 'eval_samples_per_second': 529.751, 'eval_steps_per_second': 16.879, 'epoch': 1.0}
|
| 449 |
+
{'eval_loss': 1.4676430225372314, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7687, 'eval_samples_per_second': 530.759, 'eval_steps_per_second': 16.911, 'epoch': 2.0}
|
| 450 |
+
{'eval_loss': 1.4506442546844482, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7486, 'eval_samples_per_second': 545.023, 'eval_steps_per_second': 17.366, 'epoch': 3.0}
|
| 451 |
+
{'train_runtime': 51.2265, 'train_samples_per_second': 214.811, 'train_steps_per_second': 6.735, 'train_loss': 1.5334315203238225, 'epoch': 3.0}
|
| 452 |
+
Training completed in 51.59 seconds.
|
| 453 |
+
{'eval_loss': 1.4506442546844482, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7848, 'eval_samples_per_second': 519.858, 'eval_steps_per_second': 16.564, 'epoch': 3.0}
|
| 454 |
+
|
| 455 |
+
=== FINAL RESULTS for mrpc | bert-base-uncased | adalora ===
|
| 456 |
+
Metric: 0.6887
|
| 457 |
+
Training Time: 51.59 seconds
|
| 458 |
+
|
| 459 |
+
|
| 460 |
+
==============================
|
| 461 |
+
Task: stsb | Model: bert-base-uncased | Method: adalora
|
| 462 |
+
==============================
|
| 463 |
+
|
| 464 |
+
Injected AdaLoRA adapters via PEFT.
|
| 465 |
+
Trainable params: 1789405 / 111272475 (1.61%)
|
| 466 |
+
Starting training...
|
| 467 |
+
{'eval_loss': 3.5649571418762207, 'eval_pearson': 0.018438010337713206, 'eval_spearmanr': 0.01081373344132055, 'eval_combined_score': 0.014625871889516879, 'eval_runtime': 1.9768, 'eval_samples_per_second': 758.783, 'eval_steps_per_second': 23.775, 'epoch': 1.0}
|
| 468 |
+
{'eval_loss': 3.0467772483825684, 'eval_pearson': 0.030652826982401318, 'eval_spearmanr': 0.027105310496517078, 'eval_combined_score': 0.028879068739459196, 'eval_runtime': 2.0047, 'eval_samples_per_second': 748.233, 'eval_steps_per_second': 23.445, 'epoch': 2.0}
|
| 469 |
+
{'eval_loss': 3.0490925312042236, 'eval_pearson': 0.033701994792119494, 'eval_spearmanr': 0.031193733243172723, 'eval_combined_score': 0.03244786401764611, 'eval_runtime': 1.9735, 'eval_samples_per_second': 760.073, 'eval_steps_per_second': 23.816, 'epoch': 3.0}
|
| 470 |
+
{'train_runtime': 75.4372, 'train_samples_per_second': 228.627, 'train_steps_per_second': 7.158, 'train_loss': 4.300191243489583, 'epoch': 3.0}
|
| 471 |
+
Training completed in 75.77 seconds.
|
| 472 |
+
{'eval_loss': 3.0467772483825684, 'eval_pearson': 0.030652826982401318, 'eval_spearmanr': 0.027105310496517078, 'eval_combined_score': 0.028879068739459196, 'eval_runtime': 1.9734, 'eval_samples_per_second': 760.094, 'eval_steps_per_second': 23.816, 'epoch': 3.0}
|
| 473 |
+
|
| 474 |
+
=== FINAL RESULTS for stsb | bert-base-uncased | adalora ===
|
| 475 |
+
Metric: 0.0289
|
| 476 |
+
Training Time: 75.77 seconds
|
| 477 |
+
|
| 478 |
+
|
| 479 |
+
==============================
|
| 480 |
+
Task: mnli | Model: bert-base-uncased | Method: vb_lora
|
| 481 |
+
==============================
|
| 482 |
+
|
| 483 |
+
Injected VB-LoRA adapters via PEFT.
|
| 484 |
+
Trainable params: 1259779 / 110744326 (1.14%)
|
| 485 |
+
Starting training...
|
| 486 |
+
{'loss': 0.9732, 'grad_norm': 2.977407932281494, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
|
| 487 |
+
{'eval_loss': 0.8344303369522095, 'eval_accuracy': 0.6200713194090678, 'eval_runtime': 17.8208, 'eval_samples_per_second': 550.76, 'eval_steps_per_second': 17.227, 'epoch': 1.0}
|
| 488 |
+
{'loss': 0.8339, 'grad_norm': 1.741886854171753, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
|
| 489 |
+
{'eval_loss': 0.7724255919456482, 'eval_accuracy': 0.6549159449821701, 'eval_runtime': 18.1125, 'eval_samples_per_second': 541.892, 'eval_steps_per_second': 16.95, 'epoch': 2.0}
|
| 490 |
+
{'loss': 0.7929, 'grad_norm': 2.1190361976623535, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
|
| 491 |
+
{'eval_loss': 0.751194953918457, 'eval_accuracy': 0.6682628629648497, 'eval_runtime': 18.1553, 'eval_samples_per_second': 540.613, 'eval_steps_per_second': 16.91, 'epoch': 3.0}
|
| 492 |
+
{'train_runtime': 5663.3001, 'train_samples_per_second': 208.025, 'train_steps_per_second': 6.501, 'train_loss': 0.850493589000876, 'epoch': 3.0}
|
| 493 |
+
Training completed in 5663.65 seconds.
|
| 494 |
+
{'eval_loss': 0.751194953918457, 'eval_accuracy': 0.6682628629648497, 'eval_runtime': 18.1791, 'eval_samples_per_second': 539.907, 'eval_steps_per_second': 16.888, 'epoch': 3.0}
|
| 495 |
+
{'eval_loss': 0.7144766449928284, 'eval_accuracy': 0.6841944670463792, 'eval_runtime': 18.5603, 'eval_samples_per_second': 529.734, 'eval_steps_per_second': 16.595, 'epoch': 3.0}
|
| 496 |
+
|
| 497 |
+
=== FINAL RESULTS for mnli | bert-base-uncased | vb_lora ===
|
| 498 |
+
Metric: 0.6683/0.6842
|
| 499 |
+
Training Time: 5663.65 seconds
|
| 500 |
+
|
| 501 |
+
|
| 502 |
+
==============================
|
| 503 |
+
Task: sst2 | Model: bert-base-uncased | Method: vb_lora
|
| 504 |
+
==============================
|
| 505 |
+
|
| 506 |
+
Injected VB-LoRA adapters via PEFT.
|
| 507 |
+
Trainable params: 1259010 / 110742788 (1.14%)
|
| 508 |
+
Starting training...
|
| 509 |
+
{'eval_loss': 0.30775463581085205, 'eval_accuracy': 0.8646788990825688, 'eval_runtime': 1.0436, 'eval_samples_per_second': 835.556, 'eval_steps_per_second': 26.83, 'epoch': 1.0}
|
| 510 |
+
{'eval_loss': 0.2980089485645294, 'eval_accuracy': 0.8715596330275229, 'eval_runtime': 1.046, 'eval_samples_per_second': 833.647, 'eval_steps_per_second': 26.768, 'epoch': 2.0}
|
| 511 |
+
{'eval_loss': 0.29754653573036194, 'eval_accuracy': 0.8704128440366973, 'eval_runtime': 1.0707, 'eval_samples_per_second': 814.396, 'eval_steps_per_second': 26.15, 'epoch': 3.0}
|
| 512 |
+
{'train_runtime': 683.4236, 'train_samples_per_second': 295.639, 'train_steps_per_second': 9.24, 'train_loss': 0.3783851847040776, 'epoch': 3.0}
|
| 513 |
+
Training completed in 683.79 seconds.
|
| 514 |
+
{'eval_loss': 0.29754653573036194, 'eval_accuracy': 0.8704128440366973, 'eval_runtime': 1.0569, 'eval_samples_per_second': 825.042, 'eval_steps_per_second': 26.492, 'epoch': 3.0}
|
| 515 |
+
|
| 516 |
+
=== FINAL RESULTS for sst2 | bert-base-uncased | vb_lora ===
|
| 517 |
+
Metric: 0.8704
|
| 518 |
+
Training Time: 683.79 seconds
|
| 519 |
+
|
| 520 |
+
|
| 521 |
+
==============================
|
| 522 |
+
Task: cola | Model: bert-base-uncased | Method: vb_lora
|
| 523 |
+
==============================
|
| 524 |
+
|
| 525 |
+
Injected VB-LoRA adapters via PEFT.
|
| 526 |
+
Trainable params: 1259010 / 110742788 (1.14%)
|
| 527 |
+
Starting training...
|
| 528 |
+
{'eval_loss': 0.6138903498649597, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.869, 'eval_samples_per_second': 1200.187, 'eval_steps_per_second': 37.973, 'epoch': 1.0}
|
| 529 |
+
{'eval_loss': 0.6121873259544373, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.8514, 'eval_samples_per_second': 1225.11, 'eval_steps_per_second': 38.762, 'epoch': 2.0}
|
| 530 |
+
{'eval_loss': 0.6119409203529358, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.8459, 'eval_samples_per_second': 1233.046, 'eval_steps_per_second': 39.013, 'epoch': 3.0}
|
| 531 |
+
{'train_runtime': 86.4303, 'train_samples_per_second': 296.806, 'train_steps_per_second': 9.302, 'train_loss': 0.6045411143136855, 'epoch': 3.0}
|
| 532 |
+
Training completed in 86.72 seconds.
|
| 533 |
+
{'eval_loss': 0.6119409203529358, 'eval_matthews_correlation': -0.020702674026557004, 'eval_runtime': 0.8439, 'eval_samples_per_second': 1235.924, 'eval_steps_per_second': 39.104, 'epoch': 3.0}
|
| 534 |
+
|
| 535 |
+
=== FINAL RESULTS for cola | bert-base-uncased | vb_lora ===
|
| 536 |
+
Metric: -0.0207
|
| 537 |
+
Training Time: 86.72 seconds
|
| 538 |
+
|
| 539 |
+
|
| 540 |
+
==============================
|
| 541 |
+
Task: qqp | Model: bert-base-uncased | Method: vb_lora
|
| 542 |
+
==============================
|
| 543 |
+
|
| 544 |
+
Injected VB-LoRA adapters via PEFT.
|
| 545 |
+
Trainable params: 1259010 / 110742788 (1.14%)
|
| 546 |
+
Starting training...
|
| 547 |
+
{'loss': 0.5054, 'grad_norm': 2.0415947437286377, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
|
| 548 |
+
{'eval_accuracy': 0.7799159040316597, 'eval_f1': 0.7357448325017819, 'eval_loss': 0.43818068504333496, 'eval_runtime': 59.0279, 'eval_samples_per_second': 684.93, 'eval_steps_per_second': 21.414, 'epoch': 1.0}
|
| 549 |
+
{'loss': 0.4411, 'grad_norm': 1.7730364799499512, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
|
| 550 |
+
{'eval_accuracy': 0.7931981202077665, 'eval_f1': 0.7508863927539254, 'eval_loss': 0.4178532361984253, 'eval_runtime': 53.8204, 'eval_samples_per_second': 751.202, 'eval_steps_per_second': 23.486, 'epoch': 2.0}
|
| 551 |
+
{'loss': 0.4285, 'grad_norm': 1.8256773948669434, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
|
| 552 |
+
{'eval_accuracy': 0.7980212713331685, 'eval_f1': 0.7538730484055699, 'eval_loss': 0.41063377261161804, 'eval_runtime': 58.2527, 'eval_samples_per_second': 694.046, 'eval_steps_per_second': 21.699, 'epoch': 3.0}
|
| 553 |
+
{'train_runtime': 4593.3769, 'train_samples_per_second': 237.633, 'train_steps_per_second': 7.427, 'train_loss': 0.45428847002975403, 'epoch': 3.0}
|
| 554 |
+
Training completed in 4593.73 seconds.
|
| 555 |
+
{'eval_accuracy': 0.7980212713331685, 'eval_f1': 0.7538730484055699, 'eval_loss': 0.41063377261161804, 'eval_runtime': 58.3001, 'eval_samples_per_second': 693.481, 'eval_steps_per_second': 21.681, 'epoch': 3.0}
|
| 556 |
+
|
| 557 |
+
=== FINAL RESULTS for qqp | bert-base-uncased | vb_lora ===
|
| 558 |
+
Metric: 0.7980/0.7539
|
| 559 |
+
Training Time: 4593.73 seconds
|
| 560 |
+
|
| 561 |
+
|
| 562 |
+
==============================
|
| 563 |
+
Task: qnli | Model: bert-base-uncased | Method: vb_lora
|
| 564 |
+
==============================
|
| 565 |
+
|
| 566 |
+
Injected VB-LoRA adapters via PEFT.
|
| 567 |
+
Trainable params: 1259010 / 110742788 (1.14%)
|
| 568 |
+
Starting training...
|
| 569 |
+
{'eval_loss': 0.5664609670639038, 'eval_accuracy': 0.7063884312648728, 'eval_runtime': 11.136, 'eval_samples_per_second': 490.571, 'eval_steps_per_second': 15.356, 'epoch': 1.0}
|
| 570 |
+
{'eval_loss': 0.5128809809684753, 'eval_accuracy': 0.7481237415339557, 'eval_runtime': 11.1218, 'eval_samples_per_second': 491.196, 'eval_steps_per_second': 15.375, 'epoch': 2.0}
|
| 571 |
+
{'eval_loss': 0.503043532371521, 'eval_accuracy': 0.7565440234303497, 'eval_runtime': 11.1587, 'eval_samples_per_second': 489.575, 'eval_steps_per_second': 15.324, 'epoch': 3.0}
|
| 572 |
+
{'train_runtime': 1579.3084, 'train_samples_per_second': 198.966, 'train_steps_per_second': 6.219, 'train_loss': 0.5892526125184535, 'epoch': 3.0}
|
| 573 |
+
Training completed in 1579.68 seconds.
|
| 574 |
+
{'eval_loss': 0.503043532371521, 'eval_accuracy': 0.7565440234303497, 'eval_runtime': 11.1592, 'eval_samples_per_second': 489.551, 'eval_steps_per_second': 15.324, 'epoch': 3.0}
|
| 575 |
+
|
| 576 |
+
=== FINAL RESULTS for qnli | bert-base-uncased | vb_lora ===
|
| 577 |
+
Metric: 0.7565
|
| 578 |
+
Training Time: 1579.68 seconds
|
| 579 |
+
|
| 580 |
+
|
| 581 |
+
==============================
|
| 582 |
+
Task: rte | Model: bert-base-uncased | Method: vb_lora
|
| 583 |
+
==============================
|
| 584 |
+
|
| 585 |
+
Injected VB-LoRA adapters via PEFT.
|
| 586 |
+
Trainable params: 1259010 / 110742788 (1.14%)
|
| 587 |
+
Starting training...
|
| 588 |
+
{'eval_loss': 0.6961030960083008, 'eval_accuracy': 0.4620938628158845, 'eval_runtime': 0.9002, 'eval_samples_per_second': 307.711, 'eval_steps_per_second': 9.998, 'epoch': 1.0}
|
| 589 |
+
{'eval_loss': 0.6959496140480042, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.9133, 'eval_samples_per_second': 303.289, 'eval_steps_per_second': 9.854, 'epoch': 2.0}
|
| 590 |
+
{'eval_loss': 0.6967261433601379, 'eval_accuracy': 0.4657039711191336, 'eval_runtime': 0.89, 'eval_samples_per_second': 311.227, 'eval_steps_per_second': 10.112, 'epoch': 3.0}
|
| 591 |
+
{'train_runtime': 60.3133, 'train_samples_per_second': 123.853, 'train_steps_per_second': 3.88, 'train_loss': 0.699567615476429, 'epoch': 3.0}
|
| 592 |
+
Training completed in 60.65 seconds.
|
| 593 |
+
{'eval_loss': 0.6959496140480042, 'eval_accuracy': 0.4693140794223827, 'eval_runtime': 0.872, 'eval_samples_per_second': 317.676, 'eval_steps_per_second': 10.322, 'epoch': 3.0}
|
| 594 |
+
|
| 595 |
+
=== FINAL RESULTS for rte | bert-base-uncased | vb_lora ===
|
| 596 |
+
Metric: 0.4693
|
| 597 |
+
Training Time: 60.65 seconds
|
| 598 |
+
|
| 599 |
+
|
| 600 |
+
==============================
|
| 601 |
+
Task: mrpc | Model: bert-base-uncased | Method: vb_lora
|
| 602 |
+
==============================
|
| 603 |
+
|
| 604 |
+
Injected VB-LoRA adapters via PEFT.
|
| 605 |
+
Trainable params: 1259010 / 110742788 (1.14%)
|
| 606 |
+
Starting training...
|
| 607 |
+
{'eval_loss': 0.6160275936126709, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7, 'eval_samples_per_second': 582.878, 'eval_steps_per_second': 18.572, 'epoch': 1.0}
|
| 608 |
+
{'eval_loss': 0.61436927318573, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.7001, 'eval_samples_per_second': 582.808, 'eval_steps_per_second': 18.57, 'epoch': 2.0}
|
| 609 |
+
{'eval_loss': 0.6137855052947998, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.6954, 'eval_samples_per_second': 586.711, 'eval_steps_per_second': 18.694, 'epoch': 3.0}
|
| 610 |
+
{'train_runtime': 52.5746, 'train_samples_per_second': 209.303, 'train_steps_per_second': 6.562, 'train_loss': 0.6311185975005661, 'epoch': 3.0}
|
| 611 |
+
Training completed in 52.88 seconds.
|
| 612 |
+
{'eval_loss': 0.6137855052947998, 'eval_accuracy': 0.6887254901960784, 'eval_f1': 0.8145985401459854, 'eval_runtime': 0.6713, 'eval_samples_per_second': 607.74, 'eval_steps_per_second': 19.364, 'epoch': 3.0}
|
| 613 |
+
|
| 614 |
+
=== FINAL RESULTS for mrpc | bert-base-uncased | vb_lora ===
|
| 615 |
+
Metric: 0.6887
|
| 616 |
+
Training Time: 52.88 seconds
|
| 617 |
+
|
| 618 |
+
|
| 619 |
+
==============================
|
| 620 |
+
Task: stsb | Model: bert-base-uncased | Method: vb_lora
|
| 621 |
+
==============================
|
| 622 |
+
|
| 623 |
+
Injected VB-LoRA adapters via PEFT.
|
| 624 |
+
Trainable params: 1258241 / 110741250 (1.14%)
|
| 625 |
+
Starting training...
|
| 626 |
+
{'eval_loss': 2.4936869144439697, 'eval_pearson': 0.05272379775188579, 'eval_spearmanr': 0.05256041700870254, 'eval_combined_score': 0.052642107380294165, 'eval_runtime': 1.7738, 'eval_samples_per_second': 845.665, 'eval_steps_per_second': 26.497, 'epoch': 1.0}
|
| 627 |
+
{'eval_loss': 2.325751543045044, 'eval_pearson': 0.13511265081366103, 'eval_spearmanr': 0.16174774952420415, 'eval_combined_score': 0.1484302001689326, 'eval_runtime': 1.7752, 'eval_samples_per_second': 844.99, 'eval_steps_per_second': 26.476, 'epoch': 2.0}
|
| 628 |
+
{'eval_loss': 2.3533577919006348, 'eval_pearson': 0.16271341646560994, 'eval_spearmanr': 0.19961028632970146, 'eval_combined_score': 0.1811618513976557, 'eval_runtime': 1.8103, 'eval_samples_per_second': 828.594, 'eval_steps_per_second': 25.963, 'epoch': 3.0}
|
| 629 |
+
{'train_runtime': 78.4724, 'train_samples_per_second': 219.784, 'train_steps_per_second': 6.881, 'train_loss': 3.2359501591435187, 'epoch': 3.0}
|
| 630 |
+
Training completed in 78.82 seconds.
|
| 631 |
+
{'eval_loss': 2.325751543045044, 'eval_pearson': 0.13511265081366103, 'eval_spearmanr': 0.16174774952420415, 'eval_combined_score': 0.1484302001689326, 'eval_runtime': 1.7443, 'eval_samples_per_second': 859.947, 'eval_steps_per_second': 26.945, 'epoch': 3.0}
|
| 632 |
+
|
| 633 |
+
=== FINAL RESULTS for stsb | bert-base-uncased | vb_lora ===
|
| 634 |
+
Metric: 0.1484
|
| 635 |
+
Training Time: 78.82 seconds
|
| 636 |
+
|
| 637 |
+
|
| 638 |
+
==============================
|
| 639 |
+
Task: mnli | Model: bert-base-uncased | Method: olora
|
| 640 |
+
==============================
|
| 641 |
+
|
| 642 |
+
Injected OLoRA adapters via PEFT.
|
| 643 |
+
Trainable params: 1194243 / 110678790 (1.08%)
|
| 644 |
+
Starting training...
|
| 645 |
+
{'loss': 0.7301, 'grad_norm': 27.89801788330078, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
|
| 646 |
+
{'eval_loss': 0.5824580192565918, 'eval_accuracy': 0.763525216505349, 'eval_runtime': 16.7405, 'eval_samples_per_second': 586.301, 'eval_steps_per_second': 18.339, 'epoch': 1.0}
|
| 647 |
+
{'loss': 0.6163, 'grad_norm': 17.541271209716797, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
|
| 648 |
+
{'eval_loss': 0.560714602470398, 'eval_accuracy': 0.7812531839021906, 'eval_runtime': 16.7432, 'eval_samples_per_second': 586.207, 'eval_steps_per_second': 18.336, 'epoch': 2.0}
|
| 649 |
+
{'loss': 0.5811, 'grad_norm': 22.611732482910156, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
|
| 650 |
+
{'eval_loss': 0.544284999370575, 'eval_accuracy': 0.7876719307182883, 'eval_runtime': 16.7288, 'eval_samples_per_second': 586.713, 'eval_steps_per_second': 18.352, 'epoch': 3.0}
|
| 651 |
+
{'train_runtime': 4657.1368, 'train_samples_per_second': 252.968, 'train_steps_per_second': 7.905, 'train_loss': 0.6288769857513548, 'epoch': 3.0}
|
| 652 |
+
Training completed in 4657.52 seconds.
|
| 653 |
+
{'eval_loss': 0.544284999370575, 'eval_accuracy': 0.7876719307182883, 'eval_runtime': 16.6966, 'eval_samples_per_second': 587.845, 'eval_steps_per_second': 18.387, 'epoch': 3.0}
|
| 654 |
+
{'eval_loss': 0.5244448184967041, 'eval_accuracy': 0.7910903173311635, 'eval_runtime': 17.1344, 'eval_samples_per_second': 573.815, 'eval_steps_per_second': 17.976, 'epoch': 3.0}
|
| 655 |
+
|
| 656 |
+
=== FINAL RESULTS for mnli | bert-base-uncased | olora ===
|
| 657 |
+
Metric: 0.7877/0.7911
|
| 658 |
+
Training Time: 4657.52 seconds
|
| 659 |
+
|
| 660 |
+
|
| 661 |
+
==============================
|
| 662 |
+
Task: sst2 | Model: bert-base-uncased | Method: olora
|
| 663 |
+
==============================
|
| 664 |
+
|
| 665 |
+
Injected OLoRA adapters via PEFT.
|
| 666 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 667 |
+
Starting training...
|
| 668 |
+
{'eval_loss': 0.25105705857276917, 'eval_accuracy': 0.8990825688073395, 'eval_runtime': 0.9059, 'eval_samples_per_second': 962.591, 'eval_steps_per_second': 30.909, 'epoch': 1.0}
|
| 669 |
+
{'eval_loss': 0.2556295692920685, 'eval_accuracy': 0.8979357798165137, 'eval_runtime': 0.8961, 'eval_samples_per_second': 973.058, 'eval_steps_per_second': 31.245, 'epoch': 2.0}
|
| 670 |
+
{'eval_loss': 0.25559449195861816, 'eval_accuracy': 0.9059633027522935, 'eval_runtime': 0.9124, 'eval_samples_per_second': 955.713, 'eval_steps_per_second': 30.688, 'epoch': 3.0}
|
| 671 |
+
{'train_runtime': 446.4985, 'train_samples_per_second': 452.514, 'train_steps_per_second': 14.143, 'train_loss': 0.23866029620076207, 'epoch': 3.0}
|
| 672 |
+
Training completed in 446.89 seconds.
|
| 673 |
+
{'eval_loss': 0.25105705857276917, 'eval_accuracy': 0.8990825688073395, 'eval_runtime': 0.9447, 'eval_samples_per_second': 923.001, 'eval_steps_per_second': 29.638, 'epoch': 3.0}
|
| 674 |
+
|
| 675 |
+
=== FINAL RESULTS for sst2 | bert-base-uncased | olora ===
|
| 676 |
+
Metric: 0.8991
|
| 677 |
+
Training Time: 446.89 seconds
|
| 678 |
+
|
| 679 |
+
|
| 680 |
+
==============================
|
| 681 |
+
Task: cola | Model: bert-base-uncased | Method: olora
|
| 682 |
+
==============================
|
| 683 |
+
|
| 684 |
+
Injected OLoRA adapters via PEFT.
|
| 685 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 686 |
+
Starting training...
|
| 687 |
+
{'eval_loss': 0.5550746321678162, 'eval_matthews_correlation': 0.11382192951310593, 'eval_runtime': 0.6376, 'eval_samples_per_second': 1635.944, 'eval_steps_per_second': 51.76, 'epoch': 1.0}
|
| 688 |
+
{'eval_loss': 0.5441713333129883, 'eval_matthews_correlation': 0.38281296016649696, 'eval_runtime': 0.61, 'eval_samples_per_second': 1709.713, 'eval_steps_per_second': 54.094, 'epoch': 2.0}
|
| 689 |
+
{'eval_loss': 0.5497397184371948, 'eval_matthews_correlation': 0.39302533664823136, 'eval_runtime': 0.6395, 'eval_samples_per_second': 1630.927, 'eval_steps_per_second': 51.602, 'epoch': 3.0}
|
| 690 |
+
{'train_runtime': 53.1826, 'train_samples_per_second': 482.357, 'train_steps_per_second': 15.118, 'train_loss': 0.5172855889619287, 'epoch': 3.0}
|
| 691 |
+
Training completed in 53.52 seconds.
|
| 692 |
+
{'eval_loss': 0.5441713333129883, 'eval_matthews_correlation': 0.38281296016649696, 'eval_runtime': 0.6376, 'eval_samples_per_second': 1635.842, 'eval_steps_per_second': 51.757, 'epoch': 3.0}
|
| 693 |
+
|
| 694 |
+
=== FINAL RESULTS for cola | bert-base-uncased | olora ===
|
| 695 |
+
Metric: 0.3828
|
| 696 |
+
Training Time: 53.52 seconds
|
| 697 |
+
|
| 698 |
+
|
| 699 |
+
==============================
|
| 700 |
+
Task: qqp | Model: bert-base-uncased | Method: olora
|
| 701 |
+
==============================
|
| 702 |
+
|
| 703 |
+
Injected OLoRA adapters via PEFT.
|
| 704 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 705 |
+
Starting training...
|
| 706 |
+
{'loss': 0.3952, 'grad_norm': 15.992859840393066, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
|
| 707 |
+
{'eval_accuracy': 0.8515211476626268, 'eval_f1': 0.8073553480311928, 'eval_loss': 0.3308490812778473, 'eval_runtime': 52.6496, 'eval_samples_per_second': 767.906, 'eval_steps_per_second': 24.008, 'epoch': 1.0}
|
| 708 |
+
{'loss': 0.3326, 'grad_norm': 17.12047004699707, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
|
| 709 |
+
{'eval_accuracy': 0.8558496166213208, 'eval_f1': 0.815452818239392, 'eval_loss': 0.31894829869270325, 'eval_runtime': 52.635, 'eval_samples_per_second': 768.12, 'eval_steps_per_second': 24.014, 'epoch': 2.0}
|
| 710 |
+
{'loss': 0.3139, 'grad_norm': 19.835132598876953, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
|
| 711 |
+
{'eval_accuracy': 0.8632203809052683, 'eval_f1': 0.8219231016938237, 'eval_loss': 0.308892160654068, 'eval_runtime': 52.6582, 'eval_samples_per_second': 767.782, 'eval_steps_per_second': 24.004, 'epoch': 3.0}
|
| 712 |
+
{'train_runtime': 3570.0614, 'train_samples_per_second': 305.748, 'train_steps_per_second': 9.555, 'train_loss': 0.3423057106390892, 'epoch': 3.0}
|
| 713 |
+
Training completed in 3570.45 seconds.
|
| 714 |
+
{'eval_accuracy': 0.8632203809052683, 'eval_f1': 0.8219231016938237, 'eval_loss': 0.308892160654068, 'eval_runtime': 52.682, 'eval_samples_per_second': 767.435, 'eval_steps_per_second': 23.993, 'epoch': 3.0}
|
| 715 |
+
|
| 716 |
+
=== FINAL RESULTS for qqp | bert-base-uncased | olora ===
|
| 717 |
+
Metric: 0.8632/0.8219
|
| 718 |
+
Training Time: 3570.45 seconds
|
| 719 |
+
|
| 720 |
+
|
| 721 |
+
==============================
|
| 722 |
+
Task: qnli | Model: bert-base-uncased | Method: olora
|
| 723 |
+
==============================
|
| 724 |
+
|
| 725 |
+
Injected OLoRA adapters via PEFT.
|
| 726 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 727 |
+
Starting training...
|
| 728 |
+
{'eval_loss': 0.3933061361312866, 'eval_accuracy': 0.8213435841112942, 'eval_runtime': 10.4228, 'eval_samples_per_second': 524.137, 'eval_steps_per_second': 16.406, 'epoch': 1.0}
|
| 729 |
+
{'eval_loss': 0.35298967361450195, 'eval_accuracy': 0.8411129416071755, 'eval_runtime': 10.2135, 'eval_samples_per_second': 534.882, 'eval_steps_per_second': 16.743, 'epoch': 2.0}
|
| 730 |
+
{'eval_loss': 0.338245153427124, 'eval_accuracy': 0.8513637195680029, 'eval_runtime': 10.3823, 'eval_samples_per_second': 526.183, 'eval_steps_per_second': 16.47, 'epoch': 3.0}
|
| 731 |
+
{'train_runtime': 1330.4657, 'train_samples_per_second': 236.18, 'train_steps_per_second': 7.382, 'train_loss': 0.42918042722968847, 'epoch': 3.0}
|
| 732 |
+
Training completed in 1330.87 seconds.
|
| 733 |
+
{'eval_loss': 0.338245153427124, 'eval_accuracy': 0.8513637195680029, 'eval_runtime': 10.3996, 'eval_samples_per_second': 525.311, 'eval_steps_per_second': 16.443, 'epoch': 3.0}
|
| 734 |
+
|
| 735 |
+
=== FINAL RESULTS for qnli | bert-base-uncased | olora ===
|
| 736 |
+
Metric: 0.8514
|
| 737 |
+
Training Time: 1330.87 seconds
|
| 738 |
+
|
| 739 |
+
|
| 740 |
+
==============================
|
| 741 |
+
Task: rte | Model: bert-base-uncased | Method: olora
|
| 742 |
+
==============================
|
| 743 |
+
|
| 744 |
+
Injected OLoRA adapters via PEFT.
|
| 745 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 746 |
+
Starting training...
|
| 747 |
+
{'eval_loss': 0.6978908777236938, 'eval_accuracy': 0.4981949458483754, 'eval_runtime': 0.8742, 'eval_samples_per_second': 316.85, 'eval_steps_per_second': 10.295, 'epoch': 1.0}
|
| 748 |
+
{'eval_loss': 0.6917179226875305, 'eval_accuracy': 0.5126353790613718, 'eval_runtime': 0.8406, 'eval_samples_per_second': 329.513, 'eval_steps_per_second': 10.706, 'epoch': 2.0}
|
| 749 |
+
{'eval_loss': 0.6925662755966187, 'eval_accuracy': 0.5306859205776173, 'eval_runtime': 0.8629, 'eval_samples_per_second': 321.014, 'eval_steps_per_second': 10.43, 'epoch': 3.0}
|
| 750 |
+
{'train_runtime': 56.4059, 'train_samples_per_second': 132.433, 'train_steps_per_second': 4.149, 'train_loss': 0.6980368459326589, 'epoch': 3.0}
|
| 751 |
+
Training completed in 56.78 seconds.
|
| 752 |
+
{'eval_loss': 0.6917179226875305, 'eval_accuracy': 0.5126353790613718, 'eval_runtime': 0.841, 'eval_samples_per_second': 329.385, 'eval_steps_per_second': 10.702, 'epoch': 3.0}
|
| 753 |
+
|
| 754 |
+
=== FINAL RESULTS for rte | bert-base-uncased | olora ===
|
| 755 |
+
Metric: 0.5126
|
| 756 |
+
Training Time: 56.78 seconds
|
| 757 |
+
|
| 758 |
+
|
| 759 |
+
==============================
|
| 760 |
+
Task: mrpc | Model: bert-base-uncased | Method: olora
|
| 761 |
+
==============================
|
| 762 |
+
|
| 763 |
+
Injected OLoRA adapters via PEFT.
|
| 764 |
+
Trainable params: 1193474 / 110677252 (1.08%)
|
| 765 |
+
Starting training...
|
| 766 |
+
{'eval_loss': 0.5835548639297485, 'eval_accuracy': 0.696078431372549, 'eval_f1': 0.8165680473372781, 'eval_runtime': 0.6216, 'eval_samples_per_second': 656.401, 'eval_steps_per_second': 20.915, 'epoch': 1.0}
|
| 767 |
+
{'eval_loss': 0.5289849638938904, 'eval_accuracy': 0.7107843137254902, 'eval_f1': 0.8190184049079755, 'eval_runtime': 0.6517, 'eval_samples_per_second': 626.042, 'eval_steps_per_second': 19.947, 'epoch': 2.0}
|
| 768 |
+
{'eval_loss': 0.529857337474823, 'eval_accuracy': 0.7107843137254902, 'eval_f1': 0.8190184049079755, 'eval_runtime': 0.6546, 'eval_samples_per_second': 623.294, 'eval_steps_per_second': 19.86, 'epoch': 3.0}
|
| 769 |
+
{'train_runtime': 43.5167, 'train_samples_per_second': 252.869, 'train_steps_per_second': 7.928, 'train_loss': 0.570729440882586, 'epoch': 3.0}
|
| 770 |
+
Training completed in 43.86 seconds.
|
| 771 |
+
{'eval_loss': 0.5289849638938904, 'eval_accuracy': 0.7107843137254902, 'eval_f1': 0.8190184049079755, 'eval_runtime': 0.6218, 'eval_samples_per_second': 656.177, 'eval_steps_per_second': 20.908, 'epoch': 3.0}
|
| 772 |
+
|
| 773 |
+
=== FINAL RESULTS for mrpc | bert-base-uncased | olora ===
|
| 774 |
+
Metric: 0.7108
|
| 775 |
+
Training Time: 43.86 seconds
|
| 776 |
+
|
| 777 |
+
|
| 778 |
+
==============================
|
| 779 |
+
Task: stsb | Model: bert-base-uncased | Method: olora
|
| 780 |
+
==============================
|
| 781 |
+
|
| 782 |
+
Injected OLoRA adapters via PEFT.
|
| 783 |
+
Trainable params: 1192705 / 110675714 (1.08%)
|
| 784 |
+
Starting training...
|
| 785 |
+
{'eval_loss': 1.0667306184768677, 'eval_pearson': 0.7780507120926891, 'eval_spearmanr': 0.7911748317717975, 'eval_combined_score': 0.7846127719322433, 'eval_runtime': 1.524, 'eval_samples_per_second': 984.239, 'eval_steps_per_second': 30.839, 'epoch': 1.0}
|
| 786 |
+
{'eval_loss': 0.8928676843643188, 'eval_pearson': 0.8168255680257844, 'eval_spearmanr': 0.8212092492693253, 'eval_combined_score': 0.8190174086475548, 'eval_runtime': 1.5491, 'eval_samples_per_second': 968.315, 'eval_steps_per_second': 30.341, 'epoch': 2.0}
|
| 787 |
+
{'eval_loss': 0.8309628367424011, 'eval_pearson': 0.8232657429745108, 'eval_spearmanr': 0.8258763378952371, 'eval_combined_score': 0.824571040434874, 'eval_runtime': 1.544, 'eval_samples_per_second': 971.493, 'eval_steps_per_second': 30.44, 'epoch': 3.0}
|
| 788 |
+
{'train_runtime': 62.4587, 'train_samples_per_second': 276.134, 'train_steps_per_second': 8.646, 'train_loss': 1.3374586317274306, 'epoch': 3.0}
|
| 789 |
+
Training completed in 62.84 seconds.
|
| 790 |
+
{'eval_loss': 0.8309628367424011, 'eval_pearson': 0.8232657429745108, 'eval_spearmanr': 0.8258763378952371, 'eval_combined_score': 0.824571040434874, 'eval_runtime': 1.5328, 'eval_samples_per_second': 978.63, 'eval_steps_per_second': 30.664, 'epoch': 3.0}
|
| 791 |
+
|
| 792 |
+
=== FINAL RESULTS for stsb | bert-base-uncased | olora ===
|
| 793 |
+
Metric: 0.8246
|
| 794 |
+
Training Time: 62.84 seconds
|
| 795 |
+
|
| 796 |
+
|
| 797 |
+
==============================
|
| 798 |
+
Task: mnli | Model: bert-base-uncased | Method: full_finetuning
|
| 799 |
+
==============================
|
| 800 |
+
|
| 801 |
+
Performing full fine-tuning: All parameters are trainable.
|
| 802 |
+
Proceeding with full fine-tuning (no adapter injection).
|
| 803 |
+
Trainable params: 109484547 / 109484547 (100.00%)
|
| 804 |
+
Starting training...
|
| 805 |
+
{'loss': 0.5597, 'grad_norm': 5.752374649047852, 'learning_rate': 1.4567579313342026e-05, 'epoch': 0.8148631029986962}
|
| 806 |
+
{'eval_loss': 0.4514749050140381, 'eval_accuracy': 0.8239429444727457, 'eval_runtime': 14.9081, 'eval_samples_per_second': 658.366, 'eval_steps_per_second': 20.593, 'epoch': 1.0}
|
| 807 |
+
{'loss': 0.3919, 'grad_norm': 5.102553844451904, 'learning_rate': 9.13515862668405e-06, 'epoch': 1.6297262059973925}
|
| 808 |
+
{'eval_loss': 0.46204888820648193, 'eval_accuracy': 0.8308711156393276, 'eval_runtime': 14.6024, 'eval_samples_per_second': 672.152, 'eval_steps_per_second': 21.024, 'epoch': 2.0}
|
| 809 |
+
{'loss': 0.3015, 'grad_norm': 4.843125820159912, 'learning_rate': 3.702737940026076e-06, 'epoch': 2.444589308996089}
|
| 810 |
+
{'eval_loss': 0.5197769403457642, 'eval_accuracy': 0.8348446255731024, 'eval_runtime': 14.5879, 'eval_samples_per_second': 672.819, 'eval_steps_per_second': 21.045, 'epoch': 3.0}
|
| 811 |
+
{'train_runtime': 5323.0035, 'train_samples_per_second': 221.324, 'train_steps_per_second': 6.916, 'train_loss': 0.3868698789885851, 'epoch': 3.0}
|
| 812 |
+
Training completed in 5323.34 seconds.
|
| 813 |
+
{'eval_loss': 0.4514749050140381, 'eval_accuracy': 0.8239429444727457, 'eval_runtime': 14.5681, 'eval_samples_per_second': 673.734, 'eval_steps_per_second': 21.073, 'epoch': 3.0}
|
| 814 |
+
{'eval_loss': 0.42962974309921265, 'eval_accuracy': 0.8319772172497966, 'eval_runtime': 14.9481, 'eval_samples_per_second': 657.744, 'eval_steps_per_second': 20.605, 'epoch': 3.0}
|
| 815 |
+
|
| 816 |
+
=== FINAL RESULTS for mnli | bert-base-uncased | full_finetuning ===
|
| 817 |
+
Metric: 0.8239/0.8320
|
| 818 |
+
Training Time: 5323.34 seconds
|
| 819 |
+
|
| 820 |
+
|
| 821 |
+
==============================
|
| 822 |
+
Task: sst2 | Model: bert-base-uncased | Method: full_finetuning
|
| 823 |
+
==============================
|
| 824 |
+
|
| 825 |
+
Performing full fine-tuning: All parameters are trainable.
|
| 826 |
+
Proceeding with full fine-tuning (no adapter injection).
|
| 827 |
+
Trainable params: 109483778 / 109483778 (100.00%)
|
| 828 |
+
Starting training...
|
| 829 |
+
{'eval_loss': 0.2134767472743988, 'eval_accuracy': 0.9334862385321101, 'eval_runtime': 0.7923, 'eval_samples_per_second': 1100.568, 'eval_steps_per_second': 35.339, 'epoch': 1.0}
|
| 830 |
+
{'eval_loss': 0.2665484845638275, 'eval_accuracy': 0.9220183486238532, 'eval_runtime': 0.7939, 'eval_samples_per_second': 1098.343, 'eval_steps_per_second': 35.268, 'epoch': 2.0}
|
| 831 |
+
{'eval_loss': 0.2949509024620056, 'eval_accuracy': 0.926605504587156, 'eval_runtime': 0.8068, 'eval_samples_per_second': 1080.866, 'eval_steps_per_second': 34.707, 'epoch': 3.0}
|
| 832 |
+
{'train_runtime': 479.0704, 'train_samples_per_second': 421.748, 'train_steps_per_second': 13.182, 'train_loss': 0.12836022939553643, 'epoch': 3.0}
|
| 833 |
+
Training completed in 479.43 seconds.
|
| 834 |
+
{'eval_loss': 0.2134767472743988, 'eval_accuracy': 0.9334862385321101, 'eval_runtime': 0.8041, 'eval_samples_per_second': 1084.426, 'eval_steps_per_second': 34.821, 'epoch': 3.0}
|
| 835 |
+
|
| 836 |
+
=== FINAL RESULTS for sst2 | bert-base-uncased | full_finetuning ===
|
| 837 |
+
Metric: 0.9335
|
| 838 |
+
Training Time: 479.43 seconds
|
| 839 |
+
|
| 840 |
+
|
| 841 |
+
==============================
|
| 842 |
+
Task: cola | Model: bert-base-uncased | Method: full_finetuning
|
| 843 |
+
==============================
|
| 844 |
+
|
| 845 |
+
Performing full fine-tuning: All parameters are trainable.
|
| 846 |
+
Proceeding with full fine-tuning (no adapter injection).
|
| 847 |
+
Trainable params: 109483778 / 109483778 (100.00%)
|
| 848 |
+
Starting training...
|
| 849 |
+
{'eval_loss': 0.412758469581604, 'eval_matthews_correlation': 0.5526896422396544, 'eval_runtime': 0.5472, 'eval_samples_per_second': 1906.193, 'eval_steps_per_second': 60.311, 'epoch': 1.0}
|
| 850 |
+
{'eval_loss': 0.46548306941986084, 'eval_matthews_correlation': 0.5677348492150284, 'eval_runtime': 0.5415, 'eval_samples_per_second': 1926.018, 'eval_steps_per_second': 60.938, 'epoch': 2.0}
|
| 851 |
+
{'eval_loss': 0.5247135162353516, 'eval_matthews_correlation': 0.5679361809424823, 'eval_runtime': 0.5406, 'eval_samples_per_second': 1929.352, 'eval_steps_per_second': 61.044, 'epoch': 3.0}
|
| 852 |
+
{'train_runtime': 52.3626, 'train_samples_per_second': 489.911, 'train_steps_per_second': 15.354, 'train_loss': 0.32915398137486396, 'epoch': 3.0}
|
| 853 |
+
Training completed in 52.71 seconds.
|
| 854 |
+
{'eval_loss': 0.412758469581604, 'eval_matthews_correlation': 0.5526896422396544, 'eval_runtime': 0.4981, 'eval_samples_per_second': 2093.982, 'eval_steps_per_second': 66.253, 'epoch': 3.0}
|
| 855 |
+
|
| 856 |
+
=== FINAL RESULTS for cola | bert-base-uncased | full_finetuning ===
|
| 857 |
+
Metric: 0.5527
|
| 858 |
+
Training Time: 52.71 seconds
|
| 859 |
+
|
| 860 |
+
|
| 861 |
+
==============================
|
| 862 |
+
Task: qqp | Model: bert-base-uncased | Method: full_finetuning
|
| 863 |
+
==============================
|
| 864 |
+
|
| 865 |
+
Performing full fine-tuning: All parameters are trainable.
|
| 866 |
+
Proceeding with full fine-tuning (no adapter injection).
|
| 867 |
+
Trainable params: 109483778 / 109483778 (100.00%)
|
| 868 |
+
Starting training...
|
| 869 |
+
{'loss': 0.3062, 'grad_norm': 6.693933010101318, 'learning_rate': 1.4137132471491808e-05, 'epoch': 0.879430129276229}
|
| 870 |
+
{'eval_accuracy': 0.8985901558248826, 'eval_f1': 0.8664059954382535, 'eval_loss': 0.242730051279068, 'eval_runtime': 45.8719, 'eval_samples_per_second': 881.367, 'eval_steps_per_second': 27.555, 'epoch': 1.0}
|
| 871 |
+
{'loss': 0.2005, 'grad_norm': 3.212670087814331, 'learning_rate': 8.274264942983614e-06, 'epoch': 1.758860258552458}
|
| 872 |
+
{'eval_accuracy': 0.9056393767004699, 'eval_f1': 0.875485492346356, 'eval_loss': 0.240932896733284, 'eval_runtime': 46.6544, 'eval_samples_per_second': 866.585, 'eval_steps_per_second': 27.093, 'epoch': 2.0}
|
| 873 |
+
{'loss': 0.1439, 'grad_norm': 6.470639705657959, 'learning_rate': 2.41139741447542e-06, 'epoch': 2.638290387828687}
|
| 874 |
+
{'eval_accuracy': 0.9091021518674252, 'eval_f1': 0.8781619865398004, 'eval_loss': 0.27697858214378357, 'eval_runtime': 46.7501, 'eval_samples_per_second': 864.81, 'eval_steps_per_second': 27.037, 'epoch': 3.0}
|
| 875 |
+
{'train_runtime': 3914.5254, 'train_samples_per_second': 278.843, 'train_steps_per_second': 8.714, 'train_loss': 0.20568252834988449, 'epoch': 3.0}
|
| 876 |
+
Training completed in 3914.84 seconds.
|
| 877 |
+
{'eval_accuracy': 0.9056393767004699, 'eval_f1': 0.875485492346356, 'eval_loss': 0.240932896733284, 'eval_runtime': 46.8238, 'eval_samples_per_second': 863.45, 'eval_steps_per_second': 26.995, 'epoch': 3.0}
|
| 878 |
+
|
| 879 |
+
=== FINAL RESULTS for qqp | bert-base-uncased | full_finetuning ===
|
| 880 |
+
Metric: 0.9056/0.8755
|
| 881 |
+
Training Time: 3914.84 seconds
|
| 882 |
+
|
| 883 |
+
|
| 884 |
+
==============================
|
| 885 |
+
Task: qnli | Model: bert-base-uncased | Method: full_finetuning
|
| 886 |
+
==============================
|
| 887 |
+
|
| 888 |
+
Performing full fine-tuning: All parameters are trainable.
|
| 889 |
+
Proceeding with full fine-tuning (no adapter injection).
|
| 890 |
+
Trainable params: 109483778 / 109483778 (100.00%)
|
| 891 |
+
Starting training...
|
| 892 |
+
{'eval_loss': 0.2799956798553467, 'eval_accuracy': 0.8892549881017756, 'eval_runtime': 9.229, 'eval_samples_per_second': 591.941, 'eval_steps_per_second': 18.529, 'epoch': 1.0}
|
| 893 |
+
{'eval_loss': 0.27452367544174194, 'eval_accuracy': 0.8945634266886326, 'eval_runtime': 9.2156, 'eval_samples_per_second': 592.797, 'eval_steps_per_second': 18.555, 'epoch': 2.0}
|
| 894 |
+
{'eval_loss': 0.3037053644657135, 'eval_accuracy': 0.900054914881933, 'eval_runtime': 9.0708, 'eval_samples_per_second': 602.26, 'eval_steps_per_second': 18.852, 'epoch': 3.0}
|
| 895 |
+
{'train_runtime': 1543.5827, 'train_samples_per_second': 203.571, 'train_steps_per_second': 6.363, 'train_loss': 0.2646887299000331, 'epoch': 3.0}
|
| 896 |
+
Training completed in 1543.95 seconds.
|
| 897 |
+
{'eval_loss': 0.27452367544174194, 'eval_accuracy': 0.8945634266886326, 'eval_runtime': 9.0652, 'eval_samples_per_second': 602.631, 'eval_steps_per_second': 18.863, 'epoch': 3.0}
|
| 898 |
+
|
| 899 |
+
=== FINAL RESULTS for qnli | bert-base-uncased | full_finetuning ===
|
| 900 |
+
Metric: 0.8946
|
| 901 |
+
Training Time: 1543.95 seconds
|
| 902 |
+
|
| 903 |
+
|
| 904 |
+
==============================
|
| 905 |
+
Task: rte | Model: bert-base-uncased | Method: full_finetuning
|
| 906 |
+
==============================
|
| 907 |
+
|
| 908 |
+
Performing full fine-tuning: All parameters are trainable.
|
| 909 |
+
Proceeding with full fine-tuning (no adapter injection).
|
| 910 |
+
Trainable params: 109483778 / 109483778 (100.00%)
|
| 911 |
+
Starting training...
|
| 912 |
+
{'eval_loss': 0.6838569641113281, 'eval_accuracy': 0.5379061371841155, 'eval_runtime': 0.7626, 'eval_samples_per_second': 363.225, 'eval_steps_per_second': 11.802, 'epoch': 1.0}
|
| 913 |
+
{'eval_loss': 0.6644460558891296, 'eval_accuracy': 0.6137184115523465, 'eval_runtime': 0.7536, 'eval_samples_per_second': 367.558, 'eval_steps_per_second': 11.942, 'epoch': 2.0}
|
| 914 |
+
{'eval_loss': 0.6593529582023621, 'eval_accuracy': 0.6173285198555957, 'eval_runtime': 0.755, 'eval_samples_per_second': 366.902, 'eval_steps_per_second': 11.921, 'epoch': 3.0}
|
| 915 |
+
{'train_runtime': 68.9582, 'train_samples_per_second': 108.326, 'train_steps_per_second': 3.393, 'train_loss': 0.6521141508705596, 'epoch': 3.0}
|
| 916 |
+
Training completed in 69.31 seconds.
|
| 917 |
+
{'eval_loss': 0.6593529582023621, 'eval_accuracy': 0.6173285198555957, 'eval_runtime': 0.7661, 'eval_samples_per_second': 361.559, 'eval_steps_per_second': 11.747, 'epoch': 3.0}
|
| 918 |
+
|
| 919 |
+
=== FINAL RESULTS for rte | bert-base-uncased | full_finetuning ===
|
| 920 |
+
Metric: 0.6173
|
| 921 |
+
Training Time: 69.31 seconds
|
| 922 |
+
|
| 923 |
+
|
| 924 |
+
==============================
|
| 925 |
+
Task: mrpc | Model: bert-base-uncased | Method: full_finetuning
|
| 926 |
+
==============================
|
| 927 |
+
|
| 928 |
+
Performing full fine-tuning: All parameters are trainable.
|
| 929 |
+
Proceeding with full fine-tuning (no adapter injection).
|
| 930 |
+
Trainable params: 109483778 / 109483778 (100.00%)
|
| 931 |
+
Starting training...
|
| 932 |
+
{'eval_loss': 0.4329614043235779, 'eval_accuracy': 0.7916666666666666, 'eval_f1': 0.8393194706994329, 'eval_runtime': 0.5617, 'eval_samples_per_second': 726.329, 'eval_steps_per_second': 23.143, 'epoch': 1.0}
|
| 933 |
+
{'eval_loss': 0.37167686223983765, 'eval_accuracy': 0.8382352941176471, 'eval_f1': 0.8838028169014085, 'eval_runtime': 0.5608, 'eval_samples_per_second': 727.595, 'eval_steps_per_second': 23.183, 'epoch': 2.0}
|
| 934 |
+
{'eval_loss': 0.373555064201355, 'eval_accuracy': 0.8455882352941176, 'eval_f1': 0.8911917098445595, 'eval_runtime': 0.5397, 'eval_samples_per_second': 755.993, 'eval_steps_per_second': 24.088, 'epoch': 3.0}
|
| 935 |
+
{'train_runtime': 51.1271, 'train_samples_per_second': 215.228, 'train_steps_per_second': 6.748, 'train_loss': 0.40035665760869565, 'epoch': 3.0}
|
| 936 |
+
Training completed in 51.43 seconds.
|
| 937 |
+
{'eval_loss': 0.37167686223983765, 'eval_accuracy': 0.8382352941176471, 'eval_f1': 0.8838028169014085, 'eval_runtime': 0.543, 'eval_samples_per_second': 751.353, 'eval_steps_per_second': 23.94, 'epoch': 3.0}
|
| 938 |
+
|
| 939 |
+
=== FINAL RESULTS for mrpc | bert-base-uncased | full_finetuning ===
|
| 940 |
+
Metric: 0.8382
|
| 941 |
+
Training Time: 51.43 seconds
|
| 942 |
+
|
| 943 |
+
|
| 944 |
+
==============================
|
| 945 |
+
Task: stsb | Model: bert-base-uncased | Method: full_finetuning
|
| 946 |
+
==============================
|
| 947 |
+
|
| 948 |
+
Performing full fine-tuning: All parameters are trainable.
|
| 949 |
+
Proceeding with full fine-tuning (no adapter injection).
|
| 950 |
+
Trainable params: 109483009 / 109483009 (100.00%)
|
| 951 |
+
Starting training...
|
| 952 |
+
{'eval_loss': 0.6429872512817383, 'eval_pearson': 0.8484146526961172, 'eval_spearmanr': 0.8490013360398954, 'eval_combined_score': 0.8487079943680063, 'eval_runtime': 1.3435, 'eval_samples_per_second': 1116.498, 'eval_steps_per_second': 34.984, 'epoch': 1.0}
|
| 953 |
+
{'eval_loss': 0.6205856204032898, 'eval_pearson': 0.8590625126747368, 'eval_spearmanr': 0.8583028815966305, 'eval_combined_score': 0.8586826971356836, 'eval_runtime': 1.3332, 'eval_samples_per_second': 1125.142, 'eval_steps_per_second': 35.254, 'epoch': 2.0}
|
| 954 |
+
{'eval_loss': 0.5940393209457397, 'eval_pearson': 0.8632712038056458, 'eval_spearmanr': 0.8619679001368761, 'eval_combined_score': 0.862619551971261, 'eval_runtime': 1.3473, 'eval_samples_per_second': 1113.324, 'eval_steps_per_second': 34.884, 'epoch': 3.0}
|
| 955 |
+
{'train_runtime': 69.6028, 'train_samples_per_second': 247.792, 'train_steps_per_second': 7.758, 'train_loss': 0.745247960973669, 'epoch': 3.0}
|
| 956 |
+
Training completed in 69.90 seconds.
|
| 957 |
+
{'eval_loss': 0.5940393209457397, 'eval_pearson': 0.8632712038056458, 'eval_spearmanr': 0.8619679001368761, 'eval_combined_score': 0.862619551971261, 'eval_runtime': 1.3184, 'eval_samples_per_second': 1137.781, 'eval_steps_per_second': 35.65, 'epoch': 3.0}
|
| 958 |
+
|
| 959 |
+
=== FINAL RESULTS for stsb | bert-base-uncased | full_finetuning ===
|
| 960 |
+
Metric: 0.8626
|
| 961 |
+
Training Time: 69.90 seconds
|
| 962 |
+
|
| 963 |
+
|
| 964 |
+
===== Summary of GLUE Results =====
|
| 965 |
+
Model | Method || mnli (m/mm) | sst2 (Acc) | cola (Mcc) | qqp (Acc/F1) | qnli (Acc) | rte (Acc) | mrpc (Acc) | stsb (Corr) || Average
|
| 966 |
+
-------------------------------------------------------------------------------------------------------------------------------------
|
| 967 |
+
bert-base-uncased | lora || 0.7754/0.7897 | 0.9014 | 0.2736 | 0.8592/0.8201 | 0.8424 | 0.4838 | 0.6863 | 0.7253 || 0.6919
|
| 968 |
+
bert-base-uncased | diff_lora || 0.8047/0.8116 | 0.9117 | 0.3430 | 0.8792/0.8405 | 0.8653 | 0.5379 | 0.7010 | 0.7804 || 0.7259
|
| 969 |
+
bert-base-uncased | adalora || 0.6147/0.6382 | 0.6376 | -0.0207 | 0.7877/0.7450 | 0.5850 | 0.4693 | 0.6887 | 0.0289 || 0.4727
|
| 970 |
+
bert-base-uncased | vb_lora || 0.6683/0.6842 | 0.8704 | -0.0207 | 0.7980/0.7539 | 0.7565 | 0.4693 | 0.6887 | 0.1484 || 0.5456
|
| 971 |
+
bert-base-uncased | olora || 0.7877/0.7911 | 0.8991 | 0.3828 | 0.8632/0.8219 | 0.8514 | 0.5126 | 0.7108 | 0.8246 || 0.7267
|
| 972 |
+
bert-base-uncased | full_finetuning || 0.8239/0.8320 | 0.9335 | 0.5527 | 0.9056/0.8755 | 0.8946 | 0.6173 | 0.8382 | 0.8626 || 0.8022
|